late Christmas gift: freedesktop’s Xorg for MeeGo


If you follow here [0] you will see a set of Debian packages that Graphics team at Nokia are continuously working to deploy a X11 implementation for MeeGo-Harmattan, where we target embedded systems only. Feel free to use it!

At this point, we are very proud of ourselves because 99% of the content on these repositories are based on the ones at X main-stream of development. IWO we are directly fetching the X code-base from More important, we are shaping freedesktop implementation for embedded devices. So at this moment, we are pretty much aligned with X version 1.9.2, plus a few of other commits from master branch; the rest of components, like client side libraries, are mostly what we have on freedesktop master also.

We are quite happy because it follows exactly what we’ve planned some time ago when we strategically decided to contribute to X at freedesktop community, centralizing the development there and not ignoring it. So, no-no for kdrive, no-no for massively code-drop on the top of freedesktop’s, no-no for a proprietary X implementation, no-no for major hacks or anything like that. Aside from the video driver stack, everything was dumped at freedesktop.

Next, the plan pretty much fits with MeeGo-MeeGo cause we just needs to get the work we’ve been doing straight from freedesktop, avoiding any cross fetching between down-streams of development. Hyvää!

[0] yeah, the name of the repository is not trivial at all – and probably we will change it in a near future. So poke me if you need this later on.

X Census (for 1.9)

Topic’s name is a funny (and friendly) devotion to GNOME Census. So let’s take a look at some numbers from the time Xorg 1.9 was in development – raw data is here.

Would be unfair to measure only the work that happened e.g. in X server or in the drivers being developed and come up with the statistics about “who developed X”. X and X development community are quite extensive and don’t concern only “graphics” related, i.e., pixel information that appears on your display screen. This is a very common mistake.

X does input device event processing, device keys mapping (e.g. keyboard), pixel rasterization, output and input devices hotplug and configuration, devices and user session pairing, (different) 2D/3D graphics implementation, frame-buffer content management, X application and session security, application memory resources testing, analysis and debugging and etc. So it’s far from just pixel showing up on screen.

X is a generic graphical system. I prefer to see X as an implementation that doesn’t handle (or should not) system-level resources like memory or frame-buffer content. Other people see a bit differently. Given so, I divided all X development in the following groups, that you’ll see below as bold. Statistics were generated from the time people were working on 1.9 Xorg (02 Apr to 20 Aug):

The proto set of repositories represents the X11 core protocol description together with its extensions. The implementation of X11 and extensions to be used by clients are inside lib and xcb. lib also contains some few libraries to be used within xserver. xserver contains the server implementation of X. So here are the numbers for X implementation (xserver, proto, lib and xcb repositories):

Processed 874 csets from 74 developers
59 employers found
A total of 291730 lines added, 155222 removed (delta 136508)

Developers with the most changesets
Alan Coopersmith 134 (15.3%)
Jamey Sharp 106 (12.1%)
Gaetan Nadon 84 (9.6%)
Keith Packard 66 (7.6%)
Tiago Vignatti 55 (6.3%)
Peter Hutterer 54 (6.2%)
Mikhail Gusarov 41 (4.7%)
Jeremy Huddleston 38 (4.3%)
Matt Dew 21 (2.4%)
Fernando Carrijo 19 (2.2%)

Developers with the most changed lines
Matt Dew 172273 (53.2%)
Alan Coopersmith 75739 (23.4%)
Gaetan Nadon 13199 (4.1%)
Mikhail Gusarov 8979 (2.8%)
Keith Packard 6438 (2.0%)
Jeremy Huddleston 5750 (1.8%)
Jamey Sharp 5535 (1.7%)
Tiago Vignatti 5227 (1.6%)
Marko Myllynen 5154 (1.6%)
Yaakov Selkowitz 3614 (1.1%)

Developers with the most lines removed
Marko Myllynen 4729 (3.0%)
Tiago Vignatti 3922 (2.5%)
Mikhail Gusarov 3670 (2.4%)
Yaakov Selkowitz 3523 (2.3%)
Josh Triplett 3141 (2.0%)
Adam Jackson 2521 (1.6%)
Jamey Sharp 2036 (1.3%)
Daniel Stone 312 (0.2%)
Fernando Carrijo 221 (0.1%)
Pierre-Loup A. Griffais 91 (0.1%)

Developers with the most signoffs (total 1007)
Keith Packard 184 (18.3%)
Alan Coopersmith 155 (15.4%)
Gaetan Nadon 105 (10.4%)
Jamey Sharp 103 (10.2%)
Peter Hutterer 88 (8.7%)
Tiago Vignatti 56 (5.6%)
Mikhail Gusarov 42 (4.2%)
Jeremy Huddleston 39 (3.9%)
Fernando Carrijo 19 (1.9%)
Adam Jackson 18 (1.8%)

Developers with the most reviews (total 530)
Keith Packard 74 (14.0%)
Peter Hutterer 63 (11.9%)
Jamey Sharp 62 (11.7%)
Alan Coopersmith 46 (8.7%)
Adam Jackson 44 (8.3%)
Julien Cristau 34 (6.4%)
Dan Nicholson 30 (5.7%)
Daniel Stone 23 (4.3%)
Alex Deucher 21 (4.0%)
Tiago Vignatti 18 (3.4%)

Developers with the most test credits (total 42)
Gaetan Nadon 11 (26.2%)
Tiago Vignatti 8 (19.0%)
Jeremy Huddleston 2 (4.8%)
Colin Harrison 2 (4.8%)
Richard Barnette 2 (4.8%)
Eric Anholt 2 (4.8%)
Keith Packard 1 (2.4%)
Peter Hutterer 1 (2.4%)
Dan Nicholson 1 (2.4%)
Dave Airlie 1 (2.4%)

Developers who gave the most tested-by credits (total 42)
Jamey Sharp 11 (26.2%)
Alan Coopersmith 9 (21.4%)
Keith Packard 8 (19.0%)
Tiago Vignatti 2 (4.8%)
Yaakov Selkowitz 2 (4.8%)
Kristian Høgsberg 2 (4.8%)
Jon TURNEY 2 (4.8%)
Peter Hutterer 1 (2.4%)
Mikhail Gusarov 1 (2.4%)
Pierre-Loup A. Griffais 1 (2.4%)

Developers with the most report credits (total 13)
Richard Barnette 2 (15.4%)
Jamey Sharp 1 (7.7%)
Dave Airlie 1 (7.7%)
Robert Hooker 1 (7.7%)
Fabio Pedretti 1 (7.7%)
Julien Cristau 1 (7.7%)
Matt Turner 1 (7.7%)
Kalle Olavi Niemitalo 1 (7.7%)
Chris Ball 1 (7.7%)
邓逸昕 1 (7.7%)

Developers who gave the most report credits (total 13)
Julien Cristau 3 (23.1%)
Tiago Vignatti 2 (15.4%)
Peter Hutterer 2 (15.4%)
Jamey Sharp 1 (7.7%)
Dave Airlie 1 (7.7%)
Alan Coopersmith 1 (7.7%)
Chris Wilson 1 (7.7%)
Michel Dänzer 1 (7.7%)
Pauli Nieminen 1 (7.7%)

Top changeset contributors by employer
Oracle 135 (15.4%) 106 (12.1%)
Intel 89 (10.2%)
Red Hat 87 (10.0%) 84 (9.6%)
Nokia 75 (8.6%) 41 (4.7%)
Apple 38 (4.3%) 21 (2.4%) 19 (2.2%)

Top lines changed by employer 172273 (53.2%)
Oracle 77967 (24.1%) 15840 (4.9%)
Red Hat 9684 (3.0%) 9029 (2.8%)
Intel 7898 (2.4%)
Apple 6162 (1.9%) 5986 (1.8%)
Nokia 5548 (1.7%) 3652 (1.1%)

Employers with the most signoffs (total 1007)
Intel 207 (20.6%)
Oracle 155 (15.4%)
Red Hat 119 (11.8%) 105 (10.4%) 103 (10.2%)
Nokia 78 (7.7%) 42 (4.2%)
Apple 39 (3.9%) 19 (1.9%) 17 (1.7%)

X drivers, although decreasing in functionality with the time, they still touching kernel and system-level tasks. And that’s why I prefer see those separated from the rest of X implementation. The numbers of development of X input drivers and input event processing tools (xf86-input-*, xkbcomp, xkeyboard-config repositories):

Processed 285 csets from 28 developers
24 employers found
A total of 20679 lines added, 17716 removed (delta 2963)

Developers with the most changesets
Gaetan Nadon 115 (40.4%)
Peter Hutterer 62 (21.8%)
Sergey V. Udaltsov 45 (15.8%)
Chris Bagwell 7 (2.5%)
Daniel Stone 7 (2.5%)
Stephan Hilb 5 (1.8%)
Simon Thum 4 (1.4%)
Adam Jackson 3 (1.1%)
Julien Cristau 3 (1.1%)
Oliver McFadden 3 (1.1%)

Developers with the most changed lines
Sergey V. Udaltsov 17613 (74.3%)
Gaetan Nadon 3096 (13.1%)
Peter Hutterer 1024 (4.3%)
Stephan Hilb 479 (2.0%)
Daniel Knittl-Frank 272 (1.1%)
Simon Thum 189 (0.8%)
Daniel Stone 152 (0.6%)
Chris Bagwell 81 (0.3%)
Michel Dänzer 66 (0.3%)
Patrick Curran 46 (0.2%)

Developers with the most lines removed
Gaetan Nadon 2244 (12.7%)
Peter Hutterer 206 (1.2%)
Fernando Carrijo 10 (0.1%)
Alan Coopersmith 7 (0.0%)
Julien Cristau 2 (0.0%)
Paulo Ricardo Zanoni 1 (0.0%)

Developers with the most signoffs (total 238)
Gaetan Nadon 115 (48.3%)
Peter Hutterer 79 (33.2%)
Daniel Stone 9 (3.8%)
Chris Bagwell 7 (2.9%)
Alan Coopersmith 6 (2.5%)
Fernando Carrijo 3 (1.3%)
Oliver McFadden 3 (1.3%)
Bartosz Brachaczek 2 (0.8%)
Adam Jackson 2 (0.8%)
Ævar Arnfjörð Bjarmason 2 (0.8%)

Developers with the most reviews (total 52)
Rémi Cardona 14 (26.9%)
Fernando Carrijo 9 (17.3%)
Jamey Sharp 9 (17.3%)
Peter Hutterer 8 (15.4%)
Alan Coopersmith 4 (7.7%)
Dan Nicholson 3 (5.8%)
Gaetan Nadon 1 (1.9%)
Simon Thum 1 (1.9%)
Julien Cristau 1 (1.9%)
Magnus Kessler 1 (1.9%)

Developers with the most test credits (total 5)
Peter Hutterer 2 (40.0%)
Bartek Iwaniec 2 (40.0%)
Magnus Kessler 1 (20.0%)

Developers who gave the most tested-by credits (total 5)
Bartosz Brachaczek 2 (40.0%)
Peter Hutterer 1 (20.0%)
Chris Bagwell 1 (20.0%)
Patrick Curran 1 (20.0%)

Developers with the most report credits (total 3)
Peter Hutterer 1 (33.3%)
Julien Cristau 1 (33.3%)
Gabor Z. Papp 1 (33.3%)

Developers who gave the most report credits (total 3)
Gaetan Nadon 2 (66.7%)
Gabor Z. Papp 1 (33.3%)

Top changeset contributors by employer 115 (40.4%)
Red Hat 65 (22.8%) 45 (15.8%) 7 (2.5%) 7 (2.5%)
Oracle 5 (1.8%) 5 (1.8%) 4 (1.4%)
VMWare 4 (1.4%)
Nokia 3 (1.1%)

Top lines changed by employer 17683 (74.6%) 3304 (13.9%)
Red Hat 1273 (5.4%) 479 (2.0%) 272 (1.1%) 189 (0.8%) 181 (0.8%) 81 (0.3%)
VMWare 67 (0.3%) 46 (0.2%)

Employers with the most signoffs (total 238) 115 (48.3%)
Red Hat 81 (34.0%) 9 (3.8%) 7 (2.9%)
Oracle 6 (2.5%)
Nokia 3 (1.3%) 3 (1.3%) 2 (0.8%) 2 (0.8%) 2 (0.8%)

for userspace video drivers (libdrm, mesa and all xf86-video-*):

Processed 5608 csets from 107 developers
84 employers found
A total of 528511 lines added, 1345893 removed (delta -817382)

Developers with the most changesets
Brian Paul 599 (10.7%)
Eric Anholt 597 (10.6%)
Gaetan Nadon 431 (7.7%)
Vinson Lee 426 (7.6%)
Marek Olšák 415 (7.4%)
José Fonseca 357 (6.4%)
Kenneth Graunke 326 (5.8%)
Ian Romanick 321 (5.7%)
Carl Worth 233 (4.2%)
Chris Wilson 208 (3.7%)

Developers with the most changed lines
Eric Anholt 957175 (56.3%)
Jeremy Huddleston 146459 (8.6%)
Kenneth Graunke 58744 (3.5%)
Jakob Bornecrantz 46941 (2.8%)
xgi0007 37147 (2.2%)
Brian Paul 36067 (2.1%)
Carl Worth 25201 (1.5%)
Jerome Glisse 22808 (1.3%)
Kristian Høgsberg 20469 (1.2%)
José Fonseca 18998 (1.1%)

Developers with the most lines removed
Eric Anholt 930476 (69.1%)
Jakob Bornecrantz 37952 (2.8%)
Kristian Høgsberg 6935 (0.5%)
Keith Whitwell 3829 (0.3%)
Gaetan Nadon 3113 (0.2%)
Daniel Vetter 956 (0.1%)
Chia-I Wu 451 (0.0%)
George Sapountzis 269 (0.0%)
Owain Ainsworth 58 (0.0%)
Joakim Sindholt 26 (0.0%)

Developers with the most signoffs (total 926)
Gaetan Nadon 363 (39.2%)
Chris Wilson 186 (20.1%)
Jerome Glisse 50 (5.4%)
Dave Airlie 42 (4.5%)
Daniel Vetter 37 (4.0%)
Brian Paul 27 (2.9%)
Alex Deucher 22 (2.4%)
Jeremy Huddleston 18 (1.9%)
Adam Jackson 16 (1.7%)
José Fonseca 14 (1.5%)

Developers with the most reviews (total 24)
Alan Coopersmith 6 (25.0%)
Rémi Cardona 4 (16.7%)
Ian Romanick 2 (8.3%)
Eric Anholt 2 (8.3%)
Corbin Simpson 2 (8.3%)
George Sapountzis 2 (8.3%)
Gaetan Nadon 1 (4.2%)
Chris Wilson 1 (4.2%)
Adam Jackson 1 (4.2%)
José Fonseca 1 (4.2%)

Developers with the most test credits (total 11)
Nick Bowler 2 (18.2%)
Calvin Walton 2 (18.2%)
Aaron Plattner 1 (9.1%)
Marek Olšák 1 (9.1%)
Tom Fogal 1 (9.1%)
Brian Rogers 1 (9.1%)
Arkadiusz Miśkiewicz 1 (9.1%)
Krzysztof Halasa 1 (9.1%)
Sven Arvidsson 1 (9.1%)

Developers who gave the most tested-by credits (total 11)
Daniel Vetter 5 (45.5%)
Chris Wilson 2 (18.2%)
Dan Nicholson 1 (9.1%)
Marcin Slusarz 1 (9.1%)
Francisco Jerez 1 (9.1%)
Tom Stellard 1 (9.1%)

Developers with the most report credits (total 17)
Aaron Plattner 1 (5.9%)
Brian Rogers 1 (5.9%)
Arkadiusz Miśkiewicz 1 (5.9%)
Julien Cristau 1 (5.9%)
Kenneth Graunke 1 (5.9%)
Thomas Bächler 1 (5.9%)
Niels Ole Salscheider 1 (5.9%)
Roy Spliet 1 (5.9%)
Gianluca Anzolin 1 (5.9%)
Sergey Samokhin 1 (5.9%)

Developers who gave the most report credits (total 17)
Chris Wilson 11 (64.7%)
Marek Olšák 2 (11.8%)
Julien Cristau 1 (5.9%)
Ian Romanick 1 (5.9%)
Gaetan Nadon 1 (5.9%)
Maarten Maathuis 1 (5.9%)

Top changeset contributors by employer
VMWare 1870 (33.3%)
Intel 1552 (27.7%) 431 (7.7%) 415 (7.4%) 326 (5.8%)
LunarG 195 (3.5%)
Red Hat 183 (3.3%) 127 (2.3%) 72 (1.3%)
AMD 64 (1.1%)

Top lines changed by employer
Intel 1057613 (62.3%)
Apple 238512 (14.0%)
VMWare 173210 (10.2%) 67387 (4.0%)
Red Hat 47856 (2.8%) 37148 (2.2%)
LunarG 29790 (1.8%) 15014 (0.9%) 9531 (0.6%) 6345 (0.4%)

Employers with the most signoffs (total 926) 363 (39.2%)
Intel 210 (22.7%)
Red Hat 110 (11.9%)
VMWare 58 (6.3%) 36 (3.9%)
AMD 22 (2.4%)
Apple 18 (1.9%)
Oracle 12 (1.3%) 11 (1.2%)
NVidia 11 (1.2%)

Pixman library (pixman) is a special one because can be used inside X and for other components on the system like cairo. It’s used for pixel manipulation, e.g. fast path to get advantages of CPU features:

Processed 78 csets from 8 developers
8 employers found
A total of 3088 lines added, 1270 removed (delta 1818)

Developers with the most changesets
Søren Sandmann Pedersen 54 (69.2%)
Siarhei Siamashka 9 (11.5%)
M Joonas Pihlaja 6 (7.7%)
Jeff Muizelaar 3 (3.8%)
Andrea Canciani 2 (2.6%)
Brad Smith 1 (1.3%)
Marek Vasut 1 (1.3%)
Siddharth Agarwal 1 (1.3%)

Developers with the most changed lines
Søren Sandmann Pedersen 2207 (64.7%)
Siarhei Siamashka 462 (13.6%)
M Joonas Pihlaja 185 (5.4%)
Andrea Canciani 119 (3.5%)
Marek Vasut 69 (2.0%)
Brad Smith 24 (0.7%)
Jeff Muizelaar 20 (0.6%)
Siddharth Agarwal 2 (0.1%)

Developers with the most lines removed

Developers with the most signoffs (total 5)
Egor Starkov 1 (20.0%)
Rami Ylimaki 1 (20.0%)
Jeff Muizelaar 1 (20.0%)
Marek Vasut 1 (20.0%)
Siarhei Siamashka 1 (20.0%)

Developers with the most reviews (total 0)

Developers with the most test credits (total 0)

Developers who gave the most tested-by credits (total 0)

Developers with the most report credits (total 0)

Developers who gave the most report credits (total 0)

Top changeset contributors by employer
Red Hat 54 (69.2%)
Nokia 9 (11.5%) 6 (7.7%) 3 (3.8%) 2 (2.6%) 1 (1.3%) 1 (1.3%) 1 (1.3%)

Top lines changed by employer
Red Hat 2320 (68.1%)
Nokia 666 (19.5%) 185 (5.4%) 123 (3.6%) 69 (2.0%) 24 (0.7%) 20 (0.6%) 2 (0.1%)

Employers with the most signoffs (total 5)
Nokia 3 (60.0%) 1 (20.0%) 1 (20.0%)

A very important work for X11 comformance testing is XTS, that was broken for while and now is working again:

Processed 41 csets from 4 developers
4 employers found
A total of 2244 lines added, 4078 removed (delta -1834)

Developers with the most changesets
Peter Hutterer 17 (41.5%)
Aaron Plattner 12 (29.3%)
Dan Nicholson 9 (22.0%)
Jon TURNEY 2 (4.9%)

Developers with the most changed lines
Aaron Plattner 3854 (74.1%)
Peter Hutterer 245 (4.7%)
Dan Nicholson 141 (2.7%)
Jon TURNEY 5 (0.1%)

Developers with the most lines removed
Aaron Plattner 2000 (49.0%)
Dan Nicholson 1 (0.0%)

Developers with the most signoffs (total 42)
Peter Hutterer 19 (45.2%)
Aaron Plattner 12 (28.6%)
Dan Nicholson 9 (21.4%)
Jon TURNEY 2 (4.8%)

Developers with the most reviews (total 10)
Dan Nicholson 7 (70.0%)
Peter Hutterer 3 (30.0%)

Developers with the most test credits (total 0)

Developers who gave the most tested-by credits (total 0)

Developers with the most report credits (total 0)

Developers who gave the most report credits (total 0)

Top changeset contributors by employer
Red Hat 17 (41.5%)
NVidia 12 (29.3%) 9 (22.0%) 2 (4.9%)

Top lines changed by employer
NVidia 4726 (90.9%)
Red Hat 245 (4.7%) 225 (4.3%) 5 (0.1%)

Employers with the most signoffs (total 42)
Red Hat 19 (45.2%)
NVidia 12 (28.6%) 9 (21.4%) 2 (4.8%)

X documentation (doc repository):

Processed 22 csets from 6 developers
6 employers found
A total of 315 lines added, 45930 removed (delta -45615)

Developers with the most changesets
Alan Coopersmith 12 (54.5%)
Gaetan Nadon 3 (13.6%)
Thomas Hellstrom 2 (9.1%)
Julien Cristau 2 (9.1%)
Yaakov Selkowitz 2 (9.1%)
Dirk Wallenstein 1 (4.5%)

Developers with the most changed lines
Alan Coopersmith 45843 (99.3%)
Julien Cristau 52 (0.1%)
Gaetan Nadon 36 (0.1%)
Yaakov Selkowitz 23 (0.0%)
Thomas Hellstrom 4 (0.0%)
Dirk Wallenstein 2 (0.0%)

Developers with the most lines removed
Alan Coopersmith 45627 (99.3%)
Gaetan Nadon 18 (0.0%)

Developers with the most signoffs (total 23)
Alan Coopersmith 13 (56.5%)
Gaetan Nadon 3 (13.0%)
Thomas Hellstrom 2 (8.7%)
Julien Cristau 2 (8.7%)
Yaakov Selkowitz 2 (8.7%)
Dirk Wallenstein 1 (4.3%)

Developers with the most reviews (total 4)
Alan Coopersmith 2 (50.0%)
Gaetan Nadon 1 (25.0%)
Dan Nicholson 1 (25.0%)

Developers with the most test credits (total 0)

Developers who gave the most tested-by credits (total 0)

Developers with the most report credits (total 0)

Developers who gave the most report credits (total 0)

Top changeset contributors by employer
Oracle 12 (54.5%) 3 (13.6%) 2 (9.1%) 2 (9.1%)
VMWare 2 (9.1%) 1 (4.5%)

Top lines changed by employer
Oracle 46053 (99.7%) 52 (0.1%) 36 (0.1%) 23 (0.0%)
VMWare 4 (0.0%) 2 (0.0%)

Employers with the most signoffs (total 23)
Oracle 13 (56.5%) 3 (13.0%) 2 (8.7%) 2 (8.7%)
VMWare 2 (8.7%) 1 (4.3%)

Nothing or close to nothing was done in the old font scheme (font repo), bitmap and cursor data. Also, from the total of 85 X traditional applications (apps), only 180 changesets were made and mostly concerning autoconf clean up.

Of course lines of code and changeset are far from being a good metric to see actually how the development happened. But still, it does represents something. For sure, there’s also a lot of other inaccurate information that I’m missing from this all. For instance, companies like Collabora does X development but sometimes get the merits for Nokia. Is that fair? I don’t know. And I don’t want to discuss this either :)

PS: Canonical, where are you here? Hint hint hint.

Scrutinizing X memory, part 2: what’s taking all that memory?

So here goes some statistics of the Xorg process running. All the informations were fetch from /proc/`pidof Xorg`/{smaps, status}. I used also a script found on the Web to parse and organize these informations; Mikhail Gusarov has extended this script to show a very useful output.

Xorg per se

Running just one standalone `Xorg -retro`. In my system it represents:
VmRSS: 5440 kB
VmSize: 13620 kB

from those 5440 kB of RSS:
3404 kB (63 %) come from code
1628 kB (30 %) come from malloc/mmap in anonymous memory (heap)
228 kB (4 %) come from other data mapped in memory
180 kB (3 %) come from rodata

from those same 5440 kB of RSS:
1628 kB (30 %) come from malloc/mmap in anonymous memory (heap) somewhere*
1200 kB (22 %) come from Xorg
628 kB (12 %) come from libc
316 kB (6 %) come from libcrypto
164 kB (3 %) come from libint10
136 kB (2.5%) come from libXfont
128 kB come from libxaa
120 kB come from libpixman
116 kB come from nv_drv
112 kB come from ld
102 kB come from libglx
100 kB come from swrast_dri
88 kB come from libfb
60 kB come from libpthread
48 kB come from evdev
xxx kB come from other libraries**

* just looking into /proc/, there’s no way to determine if the allocations came either from the binary itself or some DSO. I’ll definitely analyse carefully this in a near future using another approach.

** it’s missing from these numbers the input hotplug layer, which mostly systems are using today. In another data collected, I’ve seen dbus + hal taking 268 kB against amazingly 64 kB from libudev.

These measurements are not perfect; they are a snapshot of the memory when the server just started. The same footprint brought to memory at Xorg’s initialization time will differs a lot from the regular usage of the rest of Xorg’s life, which would deals with clients and users interacting. For instance, libint10 is mapping 164 kB and it’s likely that will never be swapped back to the memory again. Likewise, the heap portion will increase when clients starts to allocate pixmaps on the server.

Even though, we can see some nice facts. From the first chart, we see that almost 2/3 in RSS is used by instructions. Is it a normal behaviour of a graphics server? I don’t know. In the other chart, we see a huge footprint of libcrypto. In such library, when not counting shared mappings (e.g. used by openssl), it’s using 88 kB of RSS for private mappings only – sigh. We probably can replace it by other SHA1 implementation (in fact, we have already others inside the server) or use our built-in. We have also libpthread, used in GLX, which is being built even on systems that are not using it (e.g. Maemo on N900). libXfont shows up as a surprise to me either, taking a considerable amount of memory. We’re probably able to tweak it a bit though.

the code being started

Another way to analyse Xorg, is getting informations per code and modules being started. So I first set a breakpoint in InitOutput() function. Until InitOutput() be called:
VmRSS: 1728 kB
VmSize: 8788 kB

from 1728 kB in RSS:
1336 kB (77.3 %) come from code
132 kB (7.6 %) come from malloc/mmap in anonymous memory (heap)
144 kB (8.3 %) come from other data mapped in memory
116 kB (6.7 %) come from rodata

from 1728 kB in RSS:
436 kB (25.2 %) come from libc
328 kB (19 %) come from Xorg
316 kB (18.3 %) come from libcrypto

A breakpoint in InitOutput() means the very first steps of Xorg initialization: command line processing, OS layer being started and other basic routines. At this point, naturally it wasn’t executed much code inside Xorg yet, neither any drivers were loaded. Therefore, almost half memory usage of the process (44 %) came from basic libraries start up such as libc, libcrypto, etc.

The next chart, when setting a break point at InitInput(), shows the moment that the output is mostly done. I.e., internal loader initialized, configuration and its parsing done and output drivers already loaded. Until InitInput() be called:
VmRSS: 4436 kB
VmSize: 13724 kB

from 4436 kB in RSS:
3352 kB (75.6 %) come from code
676 kB (15.2 %) come from malloc/mmap in anonymous memory (heap)
228 kB (5.1 %) come from other data mapped in memory
180 kB (4 %) come from rodata

We see the the server’s RSS has jumped 2708 kB from the previous chart. In other words, it represents 2708 kB, or 50%, just being used to output’s initialization, and that 1004 kB (18.4 %) will be used for input initialization routines.

Well, I’m already happy with these preliminary statistics. I guess we have already work to do just looking into. Now, I plan to investigate a bit further X’s heap creation and how efficiently X clients are using pixmaps.

As always, I appreciate any corrections, suggestions and improvements.

* this text was kindly reviewed by Mikhail Gusarov.

Scrutinizing X Memory, part 1: overview

This series of documents explore how the memory is used by the Xorg server. They aim to eventually shrinks the memory footprint of the server and its related components, like X clients, modules being loaded and drivers. Embedded devices with constrained resources are the main focus here. All texts are mostly based on x86 and ARM architectures, under Linux 2.6.33 with Xorg from upstream.


One way to analyse aspects of memory usage of a given program is to scrutinize its object data. Object data contains executable code and static data. Both are of little interest from the process memory management point of view given their layout is determined by the compiler and does not change during process execution. However, we can deduce some nice informations about the object. For instance, from Xorg object we are able to get some statistics about the code, identify its structure and point out architectural mistakes just looking into.

Besides the object itself, also important is to see it in execution and how the dynamic allocations are performed on the stack and heap. So an analysis of the file object running is valuable as well.

X file object

Consider the following sections of Xorg:

.text: contains the instructions executed by the CPU and all constant data – literals. While the program is being executed, pages are loaded into physical memory carrying instructions and literals.

The number of lines in X code is huge, which in some way impacts in a huge .text segment size. In my environment .text is 1833738 bytes (1.74 MB) when the compiler is performing third degree of optimization (-O3). In a very gross view, removal of code means less instructions to execute, consequently less text and less memory footprint. For instance, just a single inclusion of fprintf will cost ~40 bytes of text in your object. Of course it’s not straightforward to cut off code all over the server, but for a given device/environment we can customize it, as already discussed.

Besides code elimination, optimize the code using compiler’s size optimization (-Os) helps a lot either: 260 kB of RSS saved here, only optimizing X server. So we might considered this and also apply the same idea in DSOs. For instance, the size of pixman library mapped on the server shrinks 30% when compiled with size optimization. Good job, compiler!

.data and .bss: static or global variables allocated at program startup.

If the variables allocated in compilation time are not initialized, then BSS (Block Started by Symbol) increases; increase BSS means also increase VM (Virtual Memory), but not necessarily RSS. The VM size is quite meaningless when measuring real memory usage. So I wouldn’t bother to analyse BSS, given the RSS occupied by X is what I really care.

On the other hand, .data section increases when some data object is initialized for permanent variables. And if these variables is being accessed, it increases directly the physical memory. A good habit here is to declare constant variables whenever is possible, so then they go to .text segment and the compiler might be able to perform optimizations.

X dynamic allocations (stack, heap and friends)

Probably this is where there’s more room for optimizations. The heap grows in response the program needs: a program like “ls” will not make a lot of demands on the heap (one hopes), while the heap of a running Xorg can grow in a truly amazing way. It shouldn’t be hard to profile all allocations done inside the server. Probably valgrind’s massif with a bunch of arguments give this for us.

X clients are able to request the server to allocate pixmaps in its own memory. Such feature is one of the main reasons of the growing-shrinking in the server’s memory footprint. Because of that, it’s very usual to see people getting confused thinking there’s a leak on the server while actually it’s on client side.

Besides heap allocations there’s also the stack, used to hold automatic variables and functions data. I don’t think there’s much to track in stack memory or ways to save overall process memory. But a good rule to follow is that typically allocation here is much faster than for dynamic storage (heap or free store), because a memory allocation in the stack involves only pointer increment rather than more complex management.

The ideas above were just an overview where we can start to work on. I don’t believe there’s an unique and certain point that we can go and fix X memory usage. We should analyse the code and attack all sides.

Next, I’ll analyse in depth each of these dynamic and static allocation ways discussed in this document, starting doing some statistics where X sucks more… memory :-P I’ll appreciate any kind of corrections/suggestions on these documents.

* this text was kindly reviewed by Ander Conselvan and Mikhail Gusarov.

multiseat – roadmap

This week our laboratory at university released the MDM utility to ease the process of installation and configuration of a multiseat box. The idea is that the end-user should not use some boring and hard howtos anymore to deploy it. Just installing a distro package must be enough now. Try it, use it, report the bugs and send the patches! :)

But I would like to call attention here because we’re still relying on the ugly Xephyr solution to build the multiseat on a simple PC machine (there are people selling this solution. Sigh). A lot of cool stuffs arriving in the linux graphics stack are lacking with this solution. So lets try trace the roadmap here that we can follow in the short/medium-term to build a good one solution:

– Vga Arbiter
We should as fast as we can forget the Xephyr hack. Definitely several instances of Xorg – one for each user session – is what we want. If someone wants to use several graphics cards to deploy a multiseat, then (s)he would probably face the problem of the vga legacy address. The vga arbitration is the solution.

Jesse seems that will work towards this in 2.6.28. There’s also a “minor” problem here that the X server still not posting secondary cards (after pci-rework).

– xrandr 1.3
To deploy a multiseat with one video card/multi-crtc, the randr extension is enough to cover the hotplug of output devices. For a multi-card configuration, xrandr must be GPU aware. So we must done some work here as well to do the automagically configuration of output devices.

– input hotplug
So far MDM is not using the last input features of X to plug or re-plug a device in the machine. It is using its own ugly method to poll for devices. Some work here is needed.

– ConsoleKit integration
Device ownership (e.g. audio, pen drive, cameras, usb toys, output and input devices) when multiple users are in the same machine could be a mess. Moreover, the security problems implied by this could be harmful. ConsoleKit seems to solve this all. It is able to keep track of all the users currently logged in.

Honestly I never took a closer look at ConsoleKit. I’m not sure if it’s prepared enough to support multiseat. So we need to take care of this as well eventually putting some hook inside MDM to work with it.

– DRI + modesetting
Support DRI in multiple X servers in parallel is not ready yet. Some redesign must be done to achieve this.

– tools for auto configuration
After all this work, some easy tools and very user-friendly would be awesome to setup on-the-fly the multiseat in the desktop environments.

Improving input latency

GSoC summary #1 – July 29

The current implementation of X Window System relies in a signal scheme to manage the input event coming from hardware devices. This scheme frequently get blocked when lot of IO is occurring (for instance, when the process is swapping in/out). Get blocked means for instance a jumping cursor on the screen and in GUI is always desirable to prioritize the system responsiveness for end users. The human/computer interface should be smooth and this is the most user visible aspect of a system.

Besides the need for improvement in system responsiveness, the current design of the event stream has some oddities, probably due historical reasons, such as the cursor update done in user-space or the huge path that takes to draw the cursor instead just connect the mouse hardware directly with the cursor position update in-kernel. Moreover there is no fundamental reason to input drivers be dependent of DDX part of the X server. Therefore a design of the input subsystem must be carefully redone to improve such issues.

Our project try to solve all this problems. In summary the goal is: to get a path from hardware input event to client delivery that cannot be blocked by rendering or IO operations, meaning we always have very low latency on input events. Moreover, a redesign of such event stream could improve the overall X graphics stack, which must be considered as well.

So far three strategies were explored to achieve the goal:

1. put X input generation stage in a separate thread

2. put X input generation and processing stages others threads

3. shortcut the kernel input layer with drm to decrease the cursor update latency

Basically 1. and 2. tries to solve the issue of blocking signals and 3. would be a completely redesign in input infrastructure. Anyway, the 3. strategy would impact in 1. and 2. but these could be implemented in parallel with the third strategy. The following sections details each strategy.

== strategy #1 ==

Strategy 1 does not uses a signal handler anymore to wake up the event generation code. It simply poll for device’s socket and giving that this code is under a separate thread this is a win for the CPUs.

With the separate thread taking care only the input code, it was expected that the cursor footprint always lived on resident memory when the mouse stills in movement. Unfortunately this was not true. For some reason it swaps back to disk. Maybe some scheduler adjusts would help here. A memory lock scheme was tried to do lock the cursor footprint always in physical memory without success.

This strategy is basically what we’ve been done is the first GSoC. This is pretty much implemented. It would not require much trouble to push it to X server from upstream. The code is here:

== strategy #2 ==

This strategy can be thought as an improvement of #1. It can be separated in two models of implementation:

Model one:

thread #1 deals with
– injection and processing of input events
thread #2 deals with
– requests from known clients
– new client that tries to connect

It would be very very nice to let both threads totally independents. But we cannot. The event delivery depends on window structure and the first thread must always wake up the second. Also, sometimes the processing of events take a while and the injection of events stays stucked in this model. So we came with this another:

Model two:

thread #1 deals with
– injection of input events from devices
thread #2 deals with
– processing of input events to clients
thread #3 deals with
– requests from known clients
– new client that tries to connect

With this model the first and the second thread become not so tied and given that we’re using non blocking fds to wake up each thread (through a pipe), CPU “enjoys” the effect of threads. For instance, under heavy drawing primitives only thread #3 would wake up.

We had a proof-of-concept of this last model and it workish (occasionally seeing some segfaults probably due of some critical regions we forgot to lock – now the only mutex that exists is inside the server queue of events).

It’s hard to imagine other threaded models mainly because the way X deals with clients are very tied in every piece of the server and it would require a lot of mutexes.

== strategy #3 ==

For sure this strategy is the most shocking one :) The idea is to connect the mouse hardware directly to the cursor position update function, all inside kernel. We’d then rewrite the event stream from the pointer device to an absolute position. Transform the relative mouse motion into an absolute screen position seems to be not that complicated, but this strategy would involve acceleration and cursor limits inside kernel as well (the current implementation of accel deals with floats, so we would have to adapt it to live in kernel).

It is a _very_ _large_ amount of codification. It would require changes to the X server, DDX driver and its corresponding kernel DRM drivers, drm library and kernel input drivers. A mini-input driver *inside* drm is also needed. We would add complexities of the connection between input device and output device to the kernel (in my proof-of-concept implementation evdev is dependent of drm. Yeah, really weird world). Moreover, we would have to avoid somehow two differents sets of the exact same code in different contexts in the case of sw cursors (think MPX). It’s a completely redesign. Things would have to go incrementally.

But why this strategy? Well, this would solve all the current issues with input latency. For instance with the current design of the kernel modesetting – which seems the future – the cursor is jumping a lot, much more than with current implementation. Try to call a xrandr instance and move the mouse with kernel modesetting. xrandr will do DDC communication which will blocked X in the kernel. So with the handling and update of the cursor inside the kernel all would work fine (and my proof-of-concept already showed this).

Moreover, I believe the current implementation remained until now due historical reasons. Ultrix systems placed the entire input subsystem in the kernel. What is the problem to do this in Linux (and others) as well (besides massive codification)?

and non-dri drivers? Should we forget them?


Google Summer of Code 2008

I’m very happy to say that I was selected again to work on Google Summer of Code with X.Org Foundation. Daniel will be my mentor again. Thanks Google. Thanks X.Org!

In the last year we did a nice work separating the input event generation code of the X server into a different thread. We saw some performance improvement there specially because the implementation is not using signals anymore to wake up the server when some device emits an event. The reason why is that when a process is in the uninterruptible sleep (D state) signals are delayed and the mouse cursor lags.

The idea now is to continue the work and put the event processing stage in the separate thread as well. This will result in a lot of structures locks and will be very challenger. I’ll be posting all my advances here.