Posts Tagged ‘vignatti’

late Christmas gift: freedesktop’s Xorg for MeeGo

December 29, 2010

Moikka.

If you follow here [0] you will see a set of Debian packages that Graphics team at Nokia are continuously working to deploy a X11 implementation for MeeGo-Harmattan, where we target embedded systems only. Feel free to use it!


At this point, we are very proud of ourselves because 99% of the content on these repositories are based on the ones at X main-stream of development. IWO we are directly fetching the X code-base from freedesktop.org. More important, we are shaping freedesktop implementation for embedded devices. So at this moment, we are pretty much aligned with X version 1.9.2, plus a few of other commits from master branch; the rest of components, like client side libraries, are mostly what we have on freedesktop master also.

We are quite happy because it follows exactly what we’ve planned some time ago when we strategically decided to contribute to X at freedesktop community, centralizing the development there and not ignoring it. So, no-no for kdrive, no-no for massively code-drop on the top of freedesktop’s, no-no for a proprietary X implementation, no-no for major hacks or anything like that. Aside from the video driver stack, everything was dumped at freedesktop.

Next, the plan pretty much fits with MeeGo-MeeGo cause we just needs to get the work we’ve been doing straight from freedesktop, avoiding any cross fetching between down-streams of development. Hyvää!

[0] yeah, the name of the repository is not trivial at all – and probably we will change it in a near future. So poke me if you need this later on.

Advertisements

Priorities and scheduling hints for X server threads

August 7, 2008

Input events routed through another thread/process can have bad effects on latency because we can’t guarantee that it will get scheduled at the right moment. Although this is hard to see happening with the current X server threaded implementation, we must design something to avoid it. One way to improve the responsiveness is to give a high priority to the input thread and also adjust the CPU scheduling. (Note that this will not avoid problems related with page faults which usually happen in the X input flow.)

Linux uses 1:1 thread model and the scheduler handles every thread as a process. For now I don’t care about others systems. Both input generation and processing threads was designed to sleep after a relatively short CPU run. So we can give a priority to processes that are trusted to not hog the CPU. And given they are special time-critical applications I have no doubt in what policy to use: I set both input threads to use the real-time FIFO policy and to get the maximum priority (sched_get_priority_max()).

I’m sure that someone will complain telling that this would decrease a bit the main thread when used together with both input threads. In GUI we’re talking about better user experience. Latency variability must be avoided whenever possible in interactive situations. What the user see is what matters. For non-interactive processes (server scheduling workloads) the situation is totally different.

Xorg’s philosophy is to be portable so we have to take care when setting this kind of parameters. It is a complex issue and different systems do it in wildly different ways. I was using my Linux box (2.6.24) to design it all.

keep it going…

August 7, 2008

Given that GSoC ’08 is getting close to the end, strategy number 2 showed more feasible to proceed my work. Strategy #3 would be a lot of fun but would imply a hell massive codification as well (also a little out of our scope). Unfortunately no-no for now.

Improving input latency

July 29, 2008

GSoC summary #1 – July 29

The current implementation of X Window System relies in a signal scheme to manage the input event coming from hardware devices. This scheme frequently get blocked when lot of IO is occurring (for instance, when the process is swapping in/out). Get blocked means for instance a jumping cursor on the screen and in GUI is always desirable to prioritize the system responsiveness for end users. The human/computer interface should be smooth and this is the most user visible aspect of a system.

Besides the need for improvement in system responsiveness, the current design of the event stream has some oddities, probably due historical reasons, such as the cursor update done in user-space or the huge path that takes to draw the cursor instead just connect the mouse hardware directly with the cursor position update in-kernel. Moreover there is no fundamental reason to input drivers be dependent of DDX part of the X server. Therefore a design of the input subsystem must be carefully redone to improve such issues.

Our project try to solve all this problems. In summary the goal is: to get a path from hardware input event to client delivery that cannot be blocked by rendering or IO operations, meaning we always have very low latency on input events. Moreover, a redesign of such event stream could improve the overall X graphics stack, which must be considered as well.

So far three strategies were explored to achieve the goal:

1. put X input generation stage in a separate thread

2. put X input generation and processing stages others threads

3. shortcut the kernel input layer with drm to decrease the cursor update latency

Basically 1. and 2. tries to solve the issue of blocking signals and 3. would be a completely redesign in input infrastructure. Anyway, the 3. strategy would impact in 1. and 2. but these could be implemented in parallel with the third strategy. The following sections details each strategy.

== strategy #1 ==

Strategy 1 does not uses a signal handler anymore to wake up the event generation code. It simply poll for device’s socket and giving that this code is under a separate thread this is a win for the CPUs.

With the separate thread taking care only the input code, it was expected that the cursor footprint always lived on resident memory when the mouse stills in movement. Unfortunately this was not true. For some reason it swaps back to disk. Maybe some scheduler adjusts would help here. A memory lock scheme was tried to do lock the cursor footprint always in physical memory without success.

This strategy is basically what we’ve been done is the first GSoC. This is pretty much implemented. It would not require much trouble to push it to X server from upstream. The code is here:
http://cgit.freedesktop.org/~vignatti/xserver/

== strategy #2 ==

This strategy can be thought as an improvement of #1. It can be separated in two models of implementation:

Model one:

thread #1 deals with
– injection and processing of input events
thread #2 deals with
– requests from known clients
– new client that tries to connect

It would be very very nice to let both threads totally independents. But we cannot. The event delivery depends on window structure and the first thread must always wake up the second. Also, sometimes the processing of events take a while and the injection of events stays stucked in this model. So we came with this another:

Model two:

thread #1 deals with
– injection of input events from devices
thread #2 deals with
– processing of input events to clients
thread #3 deals with
– requests from known clients
– new client that tries to connect

With this model the first and the second thread become not so tied and given that we’re using non blocking fds to wake up each thread (through a pipe), CPU “enjoys” the effect of threads. For instance, under heavy drawing primitives only thread #3 would wake up.

We had a proof-of-concept of this last model and it workish (occasionally seeing some segfaults probably due of some critical regions we forgot to lock – now the only mutex that exists is inside the server queue of events).

It’s hard to imagine other threaded models mainly because the way X deals with clients are very tied in every piece of the server and it would require a lot of mutexes.

== strategy #3 ==

For sure this strategy is the most shocking one :) The idea is to connect the mouse hardware directly to the cursor position update function, all inside kernel. We’d then rewrite the event stream from the pointer device to an absolute position. Transform the relative mouse motion into an absolute screen position seems to be not that complicated, but this strategy would involve acceleration and cursor limits inside kernel as well (the current implementation of accel deals with floats, so we would have to adapt it to live in kernel).

It is a _very_ _large_ amount of codification. It would require changes to the X server, DDX driver and its corresponding kernel DRM drivers, drm library and kernel input drivers. A mini-input driver *inside* drm is also needed. We would add complexities of the connection between input device and output device to the kernel (in my proof-of-concept implementation evdev is dependent of drm. Yeah, really weird world). Moreover, we would have to avoid somehow two differents sets of the exact same code in different contexts in the case of sw cursors (think MPX). It’s a completely redesign. Things would have to go incrementally.

But why this strategy? Well, this would solve all the current issues with input latency. For instance with the current design of the kernel modesetting – which seems the future – the cursor is jumping a lot, much more than with current implementation. Try to call a xrandr instance and move the mouse with kernel modesetting. xrandr will do DDC communication which will blocked X in the kernel. So with the handling and update of the cursor inside the kernel all would work fine (and my proof-of-concept already showed this).

Moreover, I believe the current implementation remained until now due historical reasons. Ultrix systems placed the entire input subsystem in the kernel. What is the problem to do this in Linux (and others) as well (besides massive codification)?

and non-dri drivers? Should we forget them?

EOF

fakemouse — a driver that emulates a mouse

May 27, 2008

For my SoC project I need some mechanism to evaluate the improvement of the input thread inside X. So I wrote a simple kernel driver that emulates the mouse device moving and emitting bits of a simple pattern. I don’t know if something like this already exists or if there are other ways to do it, but the fact is that the solution I thought took me only few hours between the moment that I imagined, collected some ideas on the Web and implemented it.

Why emulate a device? I need stress the X server always with same routines and things like XWarpPointer() and XTestFake*MotionEvent() is not close to a real user usage because they do not pass through all the paths of the event creation stage inside X. So now I can run fakemouse module together with some x11perf test and collect the results comparing the X with and without input thread. Cool :)

For those who are interested in the driver can do the following:
# wget http://web.inf.ufpr.br/vignatti/code/fakemouse-0.01.tar.gz
# tar xzvf fakemouse-0.01.tar.gz
# cd fakemouse-0.01
# make
# insmod fakemouse.ko
# echo 1 > /sys/module/fakemouse/parameters/mouseon

and be happy seeing what happens in some event node create by fakemouse (/dev/input/event*).

Google Summer of Code 2008

April 29, 2008

I’m very happy to say that I was selected again to work on Google Summer of Code with X.Org Foundation. Daniel will be my mentor again. Thanks Google. Thanks X.Org!

In the last year we did a nice work separating the input event generation code of the X server into a different thread. We saw some performance improvement there specially because the implementation is not using signals anymore to wake up the server when some device emits an event. The reason why is that when a process is in the uninterruptible sleep (D state) signals are delayed and the mouse cursor lags.

The idea now is to continue the work and put the event processing stage in the separate thread as well. This will result in a lot of structures locks and will be very challenger. I’ll be posting all my advances here.