Posts Tagged ‘n900’

Scrutinizing X memory, part 2: what’s taking all that memory?

March 25, 2010

So here goes some statistics of the Xorg process running. All the informations were fetch from /proc/`pidof Xorg`/{smaps, status}. I used also a script found on the Web to parse and organize these informations; Mikhail Gusarov has extended this script to show a very useful output.

Xorg per se

Running just one standalone `Xorg -retro`. In my system it represents:
VmRSS: 5440 kB
VmSize: 13620 kB

from those 5440 kB of RSS:
3404 kB (63 %) come from code
1628 kB (30 %) come from malloc/mmap in anonymous memory (heap)
228 kB (4 %) come from other data mapped in memory
180 kB (3 %) come from rodata

from those same 5440 kB of RSS:
1628 kB (30 %) come from malloc/mmap in anonymous memory (heap) somewhere*
1200 kB (22 %) come from Xorg
628 kB (12 %) come from libc
316 kB (6 %) come from libcrypto
164 kB (3 %) come from libint10
136 kB (2.5%) come from libXfont
128 kB come from libxaa
120 kB come from libpixman
116 kB come from nv_drv
112 kB come from ld
102 kB come from libglx
100 kB come from swrast_dri
88 kB come from libfb
60 kB come from libpthread
48 kB come from evdev
xxx kB come from other libraries**

* just looking into /proc/, there’s no way to determine if the allocations came either from the binary itself or some DSO. I’ll definitely analyse carefully this in a near future using another approach.

** it’s missing from these numbers the input hotplug layer, which mostly systems are using today. In another data collected, I’ve seen dbus + hal taking 268 kB against amazingly 64 kB from libudev.

These measurements are not perfect; they are a snapshot of the memory when the server just started. The same footprint brought to memory at Xorg’s initialization time will differs a lot from the regular usage of the rest of Xorg’s life, which would deals with clients and users interacting. For instance, libint10 is mapping 164 kB and it’s likely that will never be swapped back to the memory again. Likewise, the heap portion will increase when clients starts to allocate pixmaps on the server.

Even though, we can see some nice facts. From the first chart, we see that almost 2/3 in RSS is used by instructions. Is it a normal behaviour of a graphics server? I don’t know. In the other chart, we see a huge footprint of libcrypto. In such library, when not counting shared mappings (e.g. used by openssl), it’s using 88 kB of RSS for private mappings only – sigh. We probably can replace it by other SHA1 implementation (in fact, we have already others inside the server) or use our built-in. We have also libpthread, used in GLX, which is being built even on systems that are not using it (e.g. Maemo on N900). libXfont shows up as a surprise to me either, taking a considerable amount of memory. We’re probably able to tweak it a bit though.

the code being started

Another way to analyse Xorg, is getting informations per code and modules being started. So I first set a breakpoint in InitOutput() function. Until InitOutput() be called:
VmRSS: 1728 kB
VmSize: 8788 kB

from 1728 kB in RSS:
1336 kB (77.3 %) come from code
132 kB (7.6 %) come from malloc/mmap in anonymous memory (heap)
144 kB (8.3 %) come from other data mapped in memory
116 kB (6.7 %) come from rodata

from 1728 kB in RSS:
436 kB (25.2 %) come from libc
328 kB (19 %) come from Xorg
316 kB (18.3 %) come from libcrypto

A breakpoint in InitOutput() means the very first steps of Xorg initialization: command line processing, OS layer being started and other basic routines. At this point, naturally it wasn’t executed much code inside Xorg yet, neither any drivers were loaded. Therefore, almost half memory usage of the process (44 %) came from basic libraries start up such as libc, libcrypto, etc.

The next chart, when setting a break point at InitInput(), shows the moment that the output is mostly done. I.e., internal loader initialized, configuration and its parsing done and output drivers already loaded. Until InitInput() be called:
VmRSS: 4436 kB
VmSize: 13724 kB

from 4436 kB in RSS:
3352 kB (75.6 %) come from code
676 kB (15.2 %) come from malloc/mmap in anonymous memory (heap)
228 kB (5.1 %) come from other data mapped in memory
180 kB (4 %) come from rodata

We see the the server’s RSS has jumped 2708 kB from the previous chart. In other words, it represents 2708 kB, or 50%, just being used to output’s initialization, and that 1004 kB (18.4 %) will be used for input initialization routines.

Well, I’m already happy with these preliminary statistics. I guess we have already work to do just looking into. Now, I plan to investigate a bit further X’s heap creation and how efficiently X clients are using pixmaps.

As always, I appreciate any corrections, suggestions and improvements.

* this text was kindly reviewed by Mikhail Gusarov.

Scrutinizing X Memory, part 1: overview

March 25, 2010

This series of documents explore how the memory is used by the Xorg server. They aim to eventually shrinks the memory footprint of the server and its related components, like X clients, modules being loaded and drivers. Embedded devices with constrained resources are the main focus here. All texts are mostly based on x86 and ARM architectures, under Linux 2.6.33 with Xorg from upstream.

Overview

One way to analyse aspects of memory usage of a given program is to scrutinize its object data. Object data contains executable code and static data. Both are of little interest from the process memory management point of view given their layout is determined by the compiler and does not change during process execution. However, we can deduce some nice informations about the object. For instance, from Xorg object we are able to get some statistics about the code, identify its structure and point out architectural mistakes just looking into.

Besides the object itself, also important is to see it in execution and how the dynamic allocations are performed on the stack and heap. So an analysis of the file object running is valuable as well.

X file object

Consider the following sections of Xorg:

.text: contains the instructions executed by the CPU and all constant data – literals. While the program is being executed, pages are loaded into physical memory carrying instructions and literals.

The number of lines in X code is huge, which in some way impacts in a huge .text segment size. In my environment .text is 1833738 bytes (1.74 MB) when the compiler is performing third degree of optimization (-O3). In a very gross view, removal of code means less instructions to execute, consequently less text and less memory footprint. For instance, just a single inclusion of fprintf will cost ~40 bytes of text in your object. Of course it’s not straightforward to cut off code all over the server, but for a given device/environment we can customize it, as already discussed.

Besides code elimination, optimize the code using compiler’s size optimization (-Os) helps a lot either: 260 kB of RSS saved here, only optimizing X server. So we might considered this and also apply the same idea in DSOs. For instance, the size of pixman library mapped on the server shrinks 30% when compiled with size optimization. Good job, compiler!

.data and .bss: static or global variables allocated at program startup.

If the variables allocated in compilation time are not initialized, then BSS (Block Started by Symbol) increases; increase BSS means also increase VM (Virtual Memory), but not necessarily RSS. The VM size is quite meaningless when measuring real memory usage. So I wouldn’t bother to analyse BSS, given the RSS occupied by X is what I really care.

On the other hand, .data section increases when some data object is initialized for permanent variables. And if these variables is being accessed, it increases directly the physical memory. A good habit here is to declare constant variables whenever is possible, so then they go to .text segment and the compiler might be able to perform optimizations.

X dynamic allocations (stack, heap and friends)

Probably this is where there’s more room for optimizations. The heap grows in response the program needs: a program like “ls” will not make a lot of demands on the heap (one hopes), while the heap of a running Xorg can grow in a truly amazing way. It shouldn’t be hard to profile all allocations done inside the server. Probably valgrind’s massif with a bunch of arguments give this for us.

X clients are able to request the server to allocate pixmaps in its own memory. Such feature is one of the main reasons of the growing-shrinking in the server’s memory footprint. Because of that, it’s very usual to see people getting confused thinking there’s a leak on the server while actually it’s on client side.

Besides heap allocations there’s also the stack, used to hold automatic variables and functions data. I don’t think there’s much to track in stack memory or ways to save overall process memory. But a good rule to follow is that typically allocation here is much faster than for dynamic storage (heap or free store), because a memory allocation in the stack involves only pointer increment rather than more complex management.

The ideas above were just an overview where we can start to work on. I don’t believe there’s an unique and certain point that we can go and fix X memory usage. We should analyse the code and attack all sides.

Next, I’ll analyse in depth each of these dynamic and static allocation ways discussed in this document, starting doing some statistics where X sucks more… memory :-P I’ll appreciate any kind of corrections/suggestions on these documents.

* this text was kindly reviewed by Ander Conselvan and Mikhail Gusarov.

Customization and true modularization of Xorg

January 23, 2010

For the first time in life, Xorg is being used in a single platform and for a given device only (other devices have used an X11 implementation but using other non-canonical servers, such kdrive’s based – Tiny-X).

Previously Xorg was being packed to run in a huge amount of OSes – mostly Linux and Unix-like distributions – with the characteristic of be architecture portable and able to run on a huge set of video and input devices. In terms of software, this means an extensive amount of code able to cover all of this mentioned. But this is far from the needs of a small and single platform device.

Some days ago, at #xorg-devel, Alan mentioned the following:


06:59 < alanc> vignatti: the whole point of Xorg is to drive video output - where else would you possibly sanely put that code?
07:00 < alanc> I think you're going overboard in the drive to remove all code from Xorg

Alan was referring about my previous comment to remove some code of video memory mapping from server… I understand (and respect a lot) his concerns but lemme put this right here: it’s not about removal of code; I don’t even care if the code is in xserver or not; what I do care is about the customization – or more fancy, the true modularization [0] – of Xorg.

As discussed on the last X conference, we’re aiming to optionalize [1] lot of components inside Xorg: distros would build all components, satisfying all supported devices and drivers, whereas constrained environments (such as maemo + n900) would use a restrict set only.

So recently I’ve been confusing people’s mind trying to in fact optionalize several components of the server. There are some straightforward modifications on the code like turn off libdrm, vgahw or libxdmcp, but there’s also other more challenger like all the old-school mechanism to initialise cards, to remove cursor support or even to choose if we want or not all bus subsystem. Sometimes we’ll have to be careful to not run out of the protocol. But the truth is: the currently the code is _very_ tied all over the server. It’s not trivial to “get there”.

IMHO the plan traced at XDC looks perfectly clear and while other display systems seems not suitable enough for us yet, I’ll be keep digging on this direction.

[0] the modularization that happened in the version 1.0 was related with drivers going outside the server.

[1] what would be a good word here?

my dear diary… (some not-so-old memories)

November 8, 2009

Portland, OR

I was so proud for the forth time be part of the X crew, this time in Portland. Really amazing the city. I had the opportunity to play tourist and visit some nice places.

My X conferences

BTW, I just collect the photos of all four X conferences that I’ve attended. It’s not much photos but there’s some funny things :)

X on embedded

The cool thing of the work that I’m doing at Nokia is that I can publish and discuss mostly all X fun with the open source communities. “Mostly” because, for the last device – the N-shiny-and-awesome-900 – for instance, the video driver is closed (provided by Imagination), so we cannot comment much regarding such driver stack :( C’est la vie.

So we’ve been using Xorg for our small devices (FWIW the development of kdrive based servers is dead! Period). Therefore, myself and Oliver brought to XDC some discussions that we had about X server for tiny systems. Xorg DDX is still too fat. There’s a lot of code that we can cut off and that was the whole idea of our talk, more concerned about Xorg’s memory footprint.

Life is beautiful

6 months living in Helsinki: hosted 4 visits of friends, 5 new European countries visited and an average of 2 pints per day :) Lot of fun and no time at all for opensource pet projects. Even so, I’m enjoying! Besides, I bought an skate board and I’m speaking the first Finnish words already. I’m happy also that I’m (finally) finding time again to appreciate a good music (but didn’t get a guitar yet for me here). Let’s see what will be my next musical project… Nähdään!