Replacing KiFastSystemCall

I want to get information about a particular application, when it is calling KiFastSystemCall. To overwrite IA32_SYSENETR_EIP, we must be in ring0, so we can write a kernel mode driver for that. Th e following code worked fine.

#include <wdm.h>

ULONG d_origKiFastCallEntry;

VOID OnUnload(IN PDRIVER_OBJECT pDriverObject)
{
DbgPrint(“Driver Unload Called\n”);
}
__decl(naked) MyKiFastCallEntry()
{
__asm
{
jmp [d_origKiFastCallEntry]
}
}
NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING registryPath)
{
pDriverObject->DriverUnload = OnUnload;
DbgPrint(“Driver Loaded\n”);
__asm
{
mov ecx, 0x176

rdmsr

mov d_origKiFastCallEntry, eax

mov eax, MyKiFastCallEntry

wrmsr
}
return STATUS_SUCCESS;
}

I tried to modify the code, by adding call to GetCurrentProcess() in ‘MyKiFastCallEntry’, before jumping the original KiFastCall. But it gave me errors on compilation, like DWORD undeclared identifier, GetCurrentProcess undefined… .

Is it because GetCurrentProcess() is a win32 function which needs a main()? If so, how can we include win32 function calls in ‘MyKiFastCallEntry’? I think we cant use functions like PsGetCurrentProcess(), because it resulted in Blue screen when i used it, showing error about irql. Can we use the concept of ‘extern’ functions? I am new to the area of drivers and bit scared of Blue screens. So I thought I will try this after getting a reply. Thanks in advance.</wdm.h>

This is not going to work in SO MANY WAYS, and if you get the hack in there
is a strong likely hood that it will fail on the next system you encounter.
Once you are in the kernel, you cannot use Win32 calls period. Also, at
this point any changes to the stack will destroy you.

Tell us what the heck you think you want to do with this. There may be an
approved method of doing it, worse case Anton and some of the people who
maintain hooking examples can give you some help of a possible way to do it.
What you are attempting now is guaranteed to fail.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntdev…
>I want to get information about a particular application, when it is
>calling KiFastSystemCall. To overwrite IA32_SYSENETR_EIP, we must be in
>ring0, so we can write a kernel mode driver for that. Th e following code
>worked fine.
>
> #include <wdm.h>
>
> ULONG d_origKiFastCallEntry;
>
> VOID OnUnload(IN PDRIVER_OBJECT pDriverObject)
> {
> DbgPrint(“Driver Unload Called\n”);
> }
> __decl(naked) MyKiFastCallEntry()
> {
>__asm
> {
> jmp [d_origKiFastCallEntry]
> }
> }
> NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING
> registryPath)
> {
> pDriverObject->DriverUnload = OnUnload;
> DbgPrint(“Driver Loaded\n”);
> __asm
> {
> mov ecx, 0x176
>
> rdmsr
>
> mov d_origKiFastCallEntry, eax
>
> mov eax, MyKiFastCallEntry
>
> wrmsr
> }
> return STATUS_SUCCESS;
> }
>
> I tried to modify the code, by adding call to GetCurrentProcess() in
> ‘MyKiFastCallEntry’, before jumping the original KiFastCall. But it gave
> me errors on compilation, like DWORD undeclared identifier,
> GetCurrentProcess undefined… .
>
> Is it because GetCurrentProcess() is a win32 function which needs a
> main()? If so, how can we include win32 function calls in
> ‘MyKiFastCallEntry’? I think we cant use functions like
> PsGetCurrentProcess(), because it resulted in Blue screen when i used it,
> showing error about irql. Can we use the concept of ‘extern’ functions? I
> am new to the area of drivers and bit scared of Blue screens. So I thought
> I will try this after getting a reply. Thanks in advance.
>
>
></wdm.h>

Thanks Don. I will clearly tell you what i am up to. I am working on an academic project where, our main aim is to see if we can make terminal services more bandwidth efficient. For that we are trying to hook the system calls, before it reaches the kernel and our aim is to send the gdi calls to the remote system. What to do once it reached the client, we have to find out. But before that, we have to trap it in the server. So we thought, anyway all calls pass through KiFastSystemCall, and there is a method to hook it, so why cant we use it? Hooking the ssdt, means we have to write hook to each and every call and also we have to modify the access rights… So is there a way by which we could trap the calls in the user mode itself? I think I have made it pretty clear. Thanks

> I think we cant use functions like PsGetCurrentProcess()

Correct. All thread-specific info is pointed to by FS register, and the value of this register is different
for the kernel-mode and user-mode code, i.e. corresponding entries in GDT that point to thread-specific info are different for the kernel-mode code and the user-mode one. Furthermore, the binary layout of a structure that describes thread-specific info is different for the kernel-mode and user-mode code as well. The value of FS is changed by the system service dispatcher upon the system call. Once you hook the system service dispatcher, at the moment your code enters execution FS is still the same as it is for the user-mode code. However, if you call something like PsGetCurrentProcessId() from your hooking code, the callee does not know it - it expects FS to point to the structure that describes thread-specific info in the *kernel* mode. Therefore, it just gets a wrong interpretation of a binary structure, so that any pointer that it obtains from this structure is just bound to be invalid. BANG!!! In order to deal with this problem, you have to provide your own implementation of functions like GetCurrentProcessId(), i.e. copy-paste it into your code - you just cannot link a kernel-mode driver against user-mode libraries.

In any case, it does not really matter in this context - when it comes to GetCurrentProcess(),things are quite different . Don’t forget that GetCurrentProcess() returns a pseudohandle, which is just (HANDLE)-1 ( it holds true for both kernel-mode code and user-mode one). This is all that you need here…

Another thing that is worth mentioning is that every CPU on MP system has its own set of MSRs . However, you modify IA32_SYSENETR_EIP only on the CPU where your DriverEntry() runs at the moment modification is applied. Therefore, if you want your code to be at least consistent, you have to modify IA32_SYSENETR_EIP on every CPU on the target machine.

To be honest, I would not advise you to hook - in order to do hooking properly you need to know SIGNIFICANTLY more than your questions suggest…

Anton Bassov

>not know it - it expects FS to point to the structure that describes
thread-specific

info in the *kernel* mode.

In the kernel, FS segment references the PCR - Processor Control Region, which
has a field of the “current thread”.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

xxxxx@yahoo.com wrote:

Thanks Don. I will clearly tell you what i am up to. I am working on an academic project where, our main aim is to see if we can make terminal services more bandwidth efficient. For that we are trying to hook the system calls, before it reaches the kernel and our aim is to send the gdi calls to the remote system.

That is essentially what Terminal Services does. The RDP traffic
consists of encapsulated GDI calls.

What to do once it reached the client, we have to find out. But before that, we have to trap it in the server. So we thought, anyway all calls pass through KiFastSystemCall, and there is a method to hook it, so why cant we use it? Hooking the ssdt, means we have to write hook to each and every call and also we have to modify the access rights… So is there a way by which we could trap the calls in the user mode itself?

You could hook all of the interesting gdi32.dll entry points in each of
the running processes, I suppose.

This an interesting goal. Remember, however, that bandwidth was “THE”
primary performance issue for Citrix when they were developing what
eventually became Terminal Services, even more so than the computational
load. It was designed to operate adequately over a dial-up telephone
line, and it actually does a remarkable job of doing so. There is some
clever bit-packing in the data fields, and on-the-fly compression of
bitmaps.

There are many things about Terminal Services not to like, but I’m
extremely dubious that you’ll be able to make any significant inroads
into the bandwidth.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I wanted to clarify some information about Terminal Services.

Terminal Services, which uses rdp protocol can also be accessed using open source linux clients like rdesktop. we had looked into the code of rdesktop, and what it does is, after receiving the rdp packets, it makes some ‘x’ rendering calls. Also at the server side, there is a virtual display driver rdpdd.dll, which converts the gdi calls to low level graphics primitives. So from this, and from various sources, what we concluded is that, rdp is sending some low level graphics primitives to the client.

Otherwise, it will become more platform dependent. If we disassemble the gdi system services, we can see most of them makes calls to services exported by ntoskrnl.exe. So definitely, if the rdp traffic contained encapsulated gdi calls, then a client like rdesktop, which runs in linux, will find it difficult. rdesktop itself is not so huge, also what it does is that, once it finds a graphics packet, it calls some rendering primitives, we cant see any code for locks, semaphore…

This made us think that, if we can send gdi calls, rather than lower level graphics primitives, we may save some bandwidth, but of course, the client will have to do lot more work.

I wrote this in such detail because we are just starting the work, if there is some mistake in our understanding, we should find it now itself, before it is too late. Thanks a lot for all the suggestions!!

I don’t really know what RDP sends over the wire, but, if I understand
your reasoning correctly, then I don’t agree with it entirely. If
you’re saying that you think that RDP uses low level graphic calls to
provide platform independence, I wouldn’t entirely agree, because
although they did have to consider that not everything running a TS
client would be a Windows PC (it could be a fairly dumb terminal or a
286 in theory), and RDP is based on T.128, I don’t think that Microsoft
is particularly concerned with facilitating running Windows on anything
but Windows, and it certainly was not their first priority over
performance. I think that the focus on RDP was minimization of
bandwith, as it was designed to run reasonably over dial-up.

What you’re suggesting (encapsulating GDI calls) is essentially a
Windows metafile. I don’t know this to be the case, but my guess would
be that rdpdd.dll is performing some sort of differential analysis of
bitmaps, mouse movement, and keyboard activity, the results of which are
compressed and sent over the wire, where they are no doubt cached.
Whatever the case, the performance of what you’re considering, as you
noted, would be unbearable, as you would be competing with the artifacts
of human perception which see extremely small delays. I think this is
one of the reasons for decomposition to primitives, which is what
happends with ‘UpdateWindow’ anyway. Personally, my guess would be
that, barring things like full screen video or audio, the majority of
RDP traffic would concernthe command language around keyboard and
pointer movement, whatever that looks like, but I imagine Citrix
addressed that.

The good news, if you don’t already know it, is that I know that
Microsoft at least at one time made the RDP spec publicly available for
no fee in some circumstances, which I think an academic project would
certainly qualify for. Unfortunately, the link no longer works, so I
don’t really know what that means, other than that if you’re interested,
you’re going to have to do some looking. Microsoft links don’t work all
the time, in and of itself that doesn’t really mean anything. There is
probably someone at your University who has some familiarity with
Microsoft’s academic program.

In any event, as you probably already know, RDP is based on T.128, which
is publicly available. Have you yet checked out www.rdesktop.org? That
and more is there:

T.128 (www.rdesktop.org/t128.zip)

I just took a tour through the rdesktop source code, and if you haven’t
already you might wish to look at ‘orders.h’ That appears to be a lot
of the protocol in a nutshell. I assume that these are what you are
referring to when you say ‘Graphics Primitives.’

enum RDP_ORDER_TYPE
{
RDP_ORDER_DESTBLT = 0,
RDP_ORDER_PATBLT = 1,
RDP_ORDER_SCREENBLT = 2,
RDP_ORDER_LINE = 9,
RDP_ORDER_RECT = 10,
RDP_ORDER_DESKSAVE = 11,
RDP_ORDER_MEMBLT = 13,
RDP_ORDER_TRIBLT = 14,
RDP_ORDER_POLYGON = 20,
RDP_ORDER_POLYGON2 = 21,
RDP_ORDER_POLYLINE = 22,
RDP_ORDER_ELLIPSE = 25,
RDP_ORDER_ELLIPSE2 = 26,
RDP_ORDER_TEXT2 = 27
};

enum RDP_SECONDARY_ORDER_TYPE
{
RDP_ORDER_RAW_BMPCACHE = 0,
RDP_ORDER_COLCACHE = 1,
RDP_ORDER_BMPCACHE = 2,
RDP_ORDER_FONTCACHE = 3,
RDP_ORDER_RAW_BMPCACHE2 = 4,
RDP_ORDER_BMPCACHE2 = 5,
RDP_ORDER_BRUSHCACHE = 7
};

I don’t see anything to compress here, really.

I share Tim’s serious doubts about the prospects of improving the
performance of RD in this way. Citrix’s MetaFrame over dial-up was
nothing short of remarkable, and incomparably better than any other
remote networking mechanism at the time it came out. You have probably
never suffered Dialup Networking, and you should consider yourself
lucky. My personal opinion, based on your first post, given that this
is presumably a project with a hard deadline on the order of a semester,
would be to stay out of the kernel. Learning to use WinDbg productively
would take you or anyone else new to this a couple of months in and of
itself. Also, the hard failures that will present themselves given your
experience level will frustrate the hell out of you, just annihilate a
semester based schedule, and unless you really like rebooting, make the
whole thing thoroughly unenjoyable.

In any case, this really is not my thing, but I hope it is correct and
at least helps in some way. I commend your ambition, and it sounds like
you’ve got yourself a very interesting research project, which is really
all that matters in an academic endeavor, so have fun.

Good luck,

mm

xxxxx@yahoo.com wrote:

I wanted to clarify some information about Terminal Services.

Terminal Services, which uses rdp protocol can also be accessed using open source linux clients like rdesktop. we had looked into the code of rdesktop, and what it does is,

after receiving the rdp packets, it makes some ‘x’ rendering calls. Also
at the server side, there is a virtual display driver rdpdd.dll, which
converts the gdi calls to

low level graphics primitives. So from this, and from various sources,
what we concluded is that, rdp is sending some low level graphics
primitives to the client.

Otherwise, it will become more platform dependent. If we disassemble the gdi system services, we can see most of them makes calls to services exported by ntoskrnl.exe.

So definitely, if the rdp traffic contained encapsulated gdi calls, then
a client like rdesktop, which runs in linux, will find it difficult.

rdesktop itself is not so huge, also what it does is that, once it finds
a graphics packet, it calls some rendering primitives, we cant see any
code for locks, semaphore…

This made us think that, if we can send gdi calls, rather than lower level graphics primitives, we may save some bandwidth, but of course, the client will have to do lot more work.

I wrote this in such detail because we are just starting the work, if there is some mistake in our understanding, we should find it now itself, before it is too late.

Thanks a lot for all the suggestions!!

Thanks a lot MM. It was really inspiring. As you said, I may not really end up with the result which I want, but anyway, I will make a try. After all, the more I understand, the more enjoyable it will be. Thanks again!!!

I’m glad I could helped.

xxxxx@yahoo.com wrote:

Thanks a lot MM. It was really inspiring. As you said, I may not really end up with the result which I want, but anyway, I will make a try. After all, the more I understand, the more enjoyable it will be. Thanks again!!!

xxxxx@yahoo.com wrote:

I wanted to clarify some information about Terminal Services.

Terminal Services, which uses rdp protocol can also be accessed using open source linux clients like rdesktop. we had looked into the code of rdesktop, and what it does is, after receiving the rdp packets, it makes some ‘x’ rendering calls. Also at the server side, there is a virtual display driver rdpdd.dll, which converts the gdi calls to low level graphics primitives. So from this, and from various sources, what we concluded is that, rdp is sending some low level graphics primitives to the client.

You can see the basics in the ITU T.128 protocol,
http://rdesktop.sourceforge.net/docs/t128.zip. It’s low-level, but not
any lower than GDI.

Otherwise, it will become more platform dependent. If we disassemble the gdi system services, we can see most of them makes calls to services exported by ntoskrnl.exe. So definitely, if the rdp traffic contained encapsulated gdi calls, then a client like rdesktop, which runs in linux, will find it difficult.

Not really. GDI and X are much more alike than they are different.

rdesktop itself is not so huge, also what it does is that, once it finds a graphics packet, it calls some rendering primitives, we cant see any code for locks, semaphore…

Why would there be locks and semaphores? The protocol is all serialized
by the very nature of being a network protocol.

This made us think that, if we can send gdi calls, rather than lower level graphics primitives, we may save some bandwidth, but of course, the client will have to do lot more work.

It’s not like GDI calls are all that high-level to begin with. With a
few exceptions, GDI calls map pretty much 1:1 to X calls, and both map
pretty much 1:1 to RDP packets. Lines, circles, text and rectangles
don’t take up any bandwidth. The bandwidth gets chewed up by bitmaps,
and the protocol is already trying to minimize that.

I wrote this in such detail because we are just starting the work, if there is some mistake in our understanding, we should find it now itself, before it is too late. Thanks a lot for all the suggestions!!

There is absolutely nothing wrong with starting an academic project to
focus on this kind of thing. That’s what academia is for! However, I
will be very surprised if you are able to reduce the bandwidth by more
than 5%. Further, even if you ARE able to reduce it by 10%, I’m not
sure it will matter. The typical Terminal Services session, in my
experience, is not particularly bandwidth-bound.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I am still confused. I will try to explain what we are thinking of once again…

GDI functions mainly consists of 2 types

  1. those implemented by printer and display drivers (Drvxxx)
  2. those called by display drivers (Engxxx)

Now, when a win32 application calls a function exported by gdi32.dll/user32.dll, each of them may be implemented down in win32k.sys as one or more calls to the GDI functions. Its just a logical thought…I may be mistaken also… So if we can send the user calls, before it is split into lower level calls, cant we save bandwidth? It is this thought that led us to the project. Please correct me if I am wrong. Thanks a lot.

xxxxx@yahoo.com wrote:

I am still confused. I will try to explain what we are thinking of once again…

GDI functions mainly consists of 2 types

  1. those implemented by printer and display drivers (Drvxxx)
  2. those called by display drivers (Engxxx)

Now, when a win32 application calls a function exported by gdi32.dll/user32.dll, each of them may be implemented down in win32k.sys as one or more calls to the GDI functions. Its just a logical thought…I may be mistaken also… So if we can send the user calls, before it is split into lower level calls, cant we save bandwidth?

How will that save bandwidth?

Consider the following GDI calls from a user-mode application:

SetTextColor( hdc, 0xff0000 );
MoveTo( hdc, 100, 100 );
LineTo( hdc, 200, 200 );

The first two calls do nothing except set state in the DC. There are no
calls into the display driver, and no RDP traffic. They’re both
user-mode only. The third call uses uses the information in the DC
(current point, current color, etc), translates the coordinates to
screen space, and calls DrvStrokePath. DrvStrokePath will be given the
color, the ROP, and a list of two points in screen coordinates. This
will result in RDP traffic, containing basically the same information.

Now, for you to signal the other end to draw lines, you need all of that
information: color, ROP, coordinates. I don’t see how you can reduce
the bandwidth by intercepting at a higher level.

Most GDI calls do not map into multiple driver calls. There is a level
of abstraction, but with the exception of the state-changing APIs, the
correlation of GDI calls to GDI driver APIs is nearly 1:1. And, for the
most part, each GDI driver API is going to result in one RDP packet.

You can’t think about the individual GDI calls. You have to think about
“what information is NECESSARY for me to perform this drawing command?”
If you can reduce the amount of information, then you can reduce the
bandwidth.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> Windows metafile. I don’t know this to be the case, but my guess would

be that rdpdd.dll is performing some sort of differential analysis of
bitmaps, mouse movement, and keyboard activity, the results of which are

Minor correction: being a display driver, rdpdd.dll is hardly related to
keyboard and mouse.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Ok… I understood. Thanks again for all of your valuable information.