Archive for the ‘SoC2008’ Category

Parallel events (panic) with X

August 18, 2008

Unfortunately that model which I described some weeks ago to put the input event delivery of the X server in a separate thread wouldn’t be an advantage. I precipitated myself thinking that it could be feasible. Sorry :(

I started to implement all this but it showed a very boring task to grab all the globals variables which both threads touch and to lock it. So I decided to stop going in this way. It’s hard to program thinking in parallel. It’s even harder to debug a program with severals flows. More, the tools don’t help you (if you have lucky, gdb will work).

But the main reason I can argue to stop with this model is that the “main” event flow of execution (i.e. basically all the functions in {Swapped,}ProcVector) and the input delivery flow (ProcessInputEvents()) are very very tied. Both deal a lot with clients and we’d need to lock several globals, thus spending a lot of time in the management of the threads. It’s easy to see this acting: just put a breakpoint in TryClientEvents(). Every single request to deliver a given event to a given client involves this function. And both input and main event flow will call TryClientEvents(). So you will see a zillion of times this function being called. The contention of the eventual processing and main threads would be even greater if the client choose to receive MotionNotify event.

So yeah, it’s far from be clear how to put processing of input events inside another thread.

== Next ==

In the next days I’ll be traveling to CESol, Fortaleza here in Brazil. I was invited to talk about my work in X land. Latin America has a lot of promising countries concerning FOSS development however for some reason no one actively participate and contribute for the X development (why?). I’ll try to motivate people there somehow :)

In the next week I’ll put the generation thread in a shape good enough to eventually push this to upstream. Also I’ll try to write a good sumary of all my work given that GSoC is in the end.

keep it going…

August 7, 2008

Given that GSoC ’08 is getting close to the end, strategy number 2 showed more feasible to proceed my work. Strategy #3 would be a lot of fun but would imply a hell massive codification as well (also a little out of our scope). Unfortunately no-no for now.

Improving input latency

July 29, 2008

GSoC summary #1 – July 29

The current implementation of X Window System relies in a signal scheme to manage the input event coming from hardware devices. This scheme frequently get blocked when lot of IO is occurring (for instance, when the process is swapping in/out). Get blocked means for instance a jumping cursor on the screen and in GUI is always desirable to prioritize the system responsiveness for end users. The human/computer interface should be smooth and this is the most user visible aspect of a system.

Besides the need for improvement in system responsiveness, the current design of the event stream has some oddities, probably due historical reasons, such as the cursor update done in user-space or the huge path that takes to draw the cursor instead just connect the mouse hardware directly with the cursor position update in-kernel. Moreover there is no fundamental reason to input drivers be dependent of DDX part of the X server. Therefore a design of the input subsystem must be carefully redone to improve such issues.

Our project try to solve all this problems. In summary the goal is: to get a path from hardware input event to client delivery that cannot be blocked by rendering or IO operations, meaning we always have very low latency on input events. Moreover, a redesign of such event stream could improve the overall X graphics stack, which must be considered as well.

So far three strategies were explored to achieve the goal:

1. put X input generation stage in a separate thread

2. put X input generation and processing stages others threads

3. shortcut the kernel input layer with drm to decrease the cursor update latency

Basically 1. and 2. tries to solve the issue of blocking signals and 3. would be a completely redesign in input infrastructure. Anyway, the 3. strategy would impact in 1. and 2. but these could be implemented in parallel with the third strategy. The following sections details each strategy.

== strategy #1 ==

Strategy 1 does not uses a signal handler anymore to wake up the event generation code. It simply poll for device’s socket and giving that this code is under a separate thread this is a win for the CPUs.

With the separate thread taking care only the input code, it was expected that the cursor footprint always lived on resident memory when the mouse stills in movement. Unfortunately this was not true. For some reason it swaps back to disk. Maybe some scheduler adjusts would help here. A memory lock scheme was tried to do lock the cursor footprint always in physical memory without success.

This strategy is basically what we’ve been done is the first GSoC. This is pretty much implemented. It would not require much trouble to push it to X server from upstream. The code is here:
http://cgit.freedesktop.org/~vignatti/xserver/

== strategy #2 ==

This strategy can be thought as an improvement of #1. It can be separated in two models of implementation:

Model one:

thread #1 deals with
- injection and processing of input events
thread #2 deals with
- requests from known clients
- new client that tries to connect

It would be very very nice to let both threads totally independents. But we cannot. The event delivery depends on window structure and the first thread must always wake up the second. Also, sometimes the processing of events take a while and the injection of events stays stucked in this model. So we came with this another:

Model two:

thread #1 deals with
- injection of input events from devices
thread #2 deals with
- processing of input events to clients
thread #3 deals with
- requests from known clients
- new client that tries to connect

With this model the first and the second thread become not so tied and given that we’re using non blocking fds to wake up each thread (through a pipe), CPU “enjoys” the effect of threads. For instance, under heavy drawing primitives only thread #3 would wake up.

We had a proof-of-concept of this last model and it workish (occasionally seeing some segfaults probably due of some critical regions we forgot to lock – now the only mutex that exists is inside the server queue of events).

It’s hard to imagine other threaded models mainly because the way X deals with clients are very tied in every piece of the server and it would require a lot of mutexes.

== strategy #3 ==

For sure this strategy is the most shocking one :) The idea is to connect the mouse hardware directly to the cursor position update function, all inside kernel. We’d then rewrite the event stream from the pointer device to an absolute position. Transform the relative mouse motion into an absolute screen position seems to be not that complicated, but this strategy would involve acceleration and cursor limits inside kernel as well (the current implementation of accel deals with floats, so we would have to adapt it to live in kernel).

It is a _very_ _large_ amount of codification. It would require changes to the X server, DDX driver and its corresponding kernel DRM drivers, drm library and kernel input drivers. A mini-input driver *inside* drm is also needed. We would add complexities of the connection between input device and output device to the kernel (in my proof-of-concept implementation evdev is dependent of drm. Yeah, really weird world). Moreover, we would have to avoid somehow two differents sets of the exact same code in different contexts in the case of sw cursors (think MPX). It’s a completely redesign. Things would have to go incrementally.

But why this strategy? Well, this would solve all the current issues with input latency. For instance with the current design of the kernel modesetting – which seems the future – the cursor is jumping a lot, much more than with current implementation. Try to call a xrandr instance and move the mouse with kernel modesetting. xrandr will do DDC communication which will blocked X in the kernel. So with the handling and update of the cursor inside the kernel all would work fine (and my proof-of-concept already showed this).

Moreover, I believe the current implementation remained until now due historical reasons. Ultrix systems placed the entire input subsystem in the kernel. What is the problem to do this in Linux (and others) as well (besides massive codification)?

and non-dri drivers? Should we forget them?

EOF

cursor handling and updates inside DRM

July 10, 2008

The current DRM kernel modesetting tree is already taking care to update the cursor registers and paint it to the screen. Very cool [0].

What I’ve done today is a shortcut between the kernel input layer and DRM to update the cursor directly on screen without the X server be notified always. Of course, a lot of issues raised up together. So let’s try to delegates the tasks again.

userspace app (X server):
- starts all this mechanism telling which is the device responsible for the cursor (input ddx drv)
- responsible for loading new cursor images and push to the DRM (video ddx drv)

kernel input layer (evdev driver):
- notify and send its relative coordinates events to DRM

DRM:
- transform relative motion into absolute
- takes care the cursor limits
- responsible for the acceleration computation
- responsible for the input transformation as well?
- touch the gfx registers.

Seems that a reasonable amount of code in ddx input drv (mainly ReadInput) and dix (mainly GetPointerEvents) would be “swallowed” by the DRM. The “event generation stage” of the server would deal with the event itself + xkb + Xi things (which eventually could be done in a dedicated thread) and will let to DRM the responsibility of paint the cursor on screen.

The communication between kernel input drv can be directly, calling a DRM function; the DRM and userspace can communicate basically using ioctls. Complains?

This would only works with DRM supporting OSes. What about the others?

[0]  Not so much. Seems this method to update the cursor is sending _a lot of_ ioctls and sometimes doing cursor jumps. But I have to double check to see if the problem is for sure with context switches.

to thread the X server (?)

June 17, 2008

I really don’t like to read large blog posts. Anyway…

What I did so far is a separated thread that takes care only the injection stage on the X server queue. Who is interested with the results, please read some past posts in my blog. It is currently in a very good shape (synced with post-mpx merge, all input devices are inside the thread and etc). The implementation looks like this:

thread #1 deals with
- injection of input events from devices
thread #2 deals with
- processing of input events to clients
- requests from known clients (rendering things)
- new client that tries to connect (pretty easy to do)

Now I am pondering the following:
Model one:

thread #1 deals with
- injection and processing of input events
thread #2 deals with
- requests from known clients
- new client that tries to connect

It would be very very nice to let both threads totally independents. But we cannot. The event delivery depends on the window structure and the first thread must always wake up the second. Also, sometimes the processing of events take a while and the injection of events stays stucked in this model. So I came with this another:

Model two:

thread #1 deals with
- injection of input events from devices
thread #2 deals with
- processing of input events to clients
thread #3 deals with
- requests from known clients
- new client that tries to connect

With this way the first and the second thread become not so tied and given that I’m using non blocking fds to wake up each thread (through a pipe), the server “enjoys” the effect of threads. For instance, under heavy drawing primitives only thread #3 would wake up.

Well, I had implemented both models here and it workish (occasionally seeing some segfaults probably due of some critical regions I forgot to lock — now the only mutex that exists is inside the server queue of events).

It’s hard to imagine other models mainly because the way X deals with clients are very tied in every piece of the server and it would require a lot of mutexes. But I’d love to hear anyone disagreeing with this and proposing something here!

=== What make things so hard ===

Debug a multi-thread program is really tedious, consequentially work in this kind of environment as well. And that could be a great argument to not accept an X server multi-threaded in upstream some day.

The thing that most pissing me off is how gdb sucks to debug the server. I’m using the last version of gdb (6.8 version and from today’s snapshot) and it often hard lock my machine (I’m sure I had to reboot my machine a least 20 times today…sigh.) Sometimes it consumes 99% of CPU and sometimes it completely hangs my machine. Yes, this is really boring. Anyone knows some magic parameters in gdb to avoid this hard lock up? Anyway, the goal for long term is to let an option — at compilation time — to not use the input thread.

Comments?

fakemouse — a driver that emulates a mouse

May 27, 2008

For my SoC project I need some mechanism to evaluate the improvement of the input thread inside X. So I wrote a simple kernel driver that emulates the mouse device moving and emitting bits of a simple pattern. I don’t know if something like this already exists or if there are other ways to do it, but the fact is that the solution I thought took me only few hours between the moment that I imagined, collected some ideas on the Web and implemented it.

Why emulate a device? I need stress the X server always with same routines and things like XWarpPointer() and XTestFake*MotionEvent() is not close to a real user usage because they do not pass through all the paths of the event creation stage inside X. So now I can run fakemouse module together with some x11perf test and collect the results comparing the X with and without input thread. Cool :)

For those who are interested in the driver can do the following:
# wget http://web.inf.ufpr.br/vignatti/code/fakemouse-0.01.tar.gz
# tar xzvf fakemouse-0.01.tar.gz
# cd fakemouse-0.01
# make
# insmod fakemouse.ko
# echo 1 > /sys/module/fakemouse/parameters/mouseon

and be happy seeing what happens in some event node create by fakemouse (/dev/input/event*).

Google Summer of Code 2008

April 29, 2008

I’m very happy to say that I was selected again to work on Google Summer of Code with X.Org Foundation. Daniel will be my mentor again. Thanks Google. Thanks X.Org!

In the last year we did a nice work separating the input event generation code of the X server into a different thread. We saw some performance improvement there specially because the implementation is not using signals anymore to wake up the server when some device emits an event. The reason why is that when a process is in the uninterruptible sleep (D state) signals are delayed and the mouse cursor lags.

The idea now is to continue the work and put the event processing stage in the separate thread as well. This will result in a lot of structures locks and will be very challenger. I’ll be posting all my advances here.


Follow

Get every new post delivered to your Inbox.