You know, we looked at that 2-byte edi magic often enough when we were
writing DriverStudio code, and we were baffled by it, because we didn’t see
any reason for it. Now I understand the reasoning behind it, but I’m still
not sure it’s needed.
I didn’t look at it in much detail recently, it’s been over two years now,
but I am going to bet that in most x86 compatible architectures a jump or a
subroutine call fetches at least one cache line to start running. Chances
are that the whole preamble will be fetched in one go, specially if one
takes care to align one’s functions on cache line boundaries (actually, I
bet an 8-byte alignment will do, it only takes a jump 5 bytes in an 32-bit
machine) to speed up the fetching. The key to enlightment, I believe, is to
think about things as they happen at fetch time.
Furthermore, to safely plant a jump in a sequence is rather easy, for
example, one can stuff an int 3 in the first byte of the function, have the
ISR handle the problematic issues, give it a little time to stabilize, and
then stuff the jump.
But the easiest thing to do is to lock them all out during the hooking: an
IPI does wonders.
Alberto.
----- Original Message -----
From: “David J. Craig”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Tuesday, August 28, 2007 8:30 PM
Subject: Re:[ntdev] Detours?
> The two byte nop sequence at the beginning of a function can be guaranteed
> to fall on a 2, 4, 8, or 16 byte boundary, depending upon the linker.
> Since MS writes the linker they can force it to put that nop on which ever
> boundary they find useful. Since it is also the beginning of a function,
> and I suspect not a function that can be ‘fallen into’ there is a lot more
> safety involved in using a 16-bit locked move of some sort - maybe an
> interlocked call or a lock prefix or a compare and exchange of the correct
> size. Since it is at the beginning of the function, replacing those
> 16-bits will not cause a prefetch to fail half way through to get all the
> nop or the new code.
>
> Run windbg on a 8-cpu target and do a manual break. It take a while
> before the target gets all the cpus marshaled into limbo land and returns
> to the debugger on the host. This makes some of the tricks such as using
> affinity and busy loops for all the cpus except the patching one, far more
> obvious and user unfriendly.
>
> Good idea about becoming a hypervisor and doing the patching.
>
> –
> David J. Craig
> Engineer, Sr. Staff Software Systems
> Broadcom Corporation
>
>
> “Alberto Moreira” wrote in message news:xxxxx@ntdev…
>> It’s way more complicated than that. There’s a pipeline, right ? By the
>> time an instruction is executed, it has been fetched way back when.
>> Reasoning at run time doesn’t do, you have to reason at fetch time.
>> Using one-byte hooks is ok, but not for example for a profiler such as
>> TrueTime, because it falsifies the timing. Also note, ideally the hooking
>> engine should act outside the bounds of the OS, so that when it hooks it
>> knows that it’s not going to be preempted. Once you go into the hooking
>> business, you may be assuming control and bumping the OS one level
>> higher!
>>
>> Alberto.
>>
>>
>> ----- Original Message -----
>> From: “Matt Miller”
>> To: “Windows System Software Devs Interest List”
>> Sent: Monday, August 27, 2007 10:16 PM
>> Subject: Re: [ntdev] Detours?
>>
>>
>>> On Mon, Aug 27, 2007 at 10:02:00PM -0400, Alberto Moreira wrote:
>>>> Oh, wow, SoftICE reinvented ? We used to call it “Capt’n Hook”, the
>>>> hooking
>>>> engine. BoundsChecker, TrueTime and TrueCoverage used it too. By the
>>>> way,
>>>> if you want to safely intercept code, multiprocessor proof, you can use
>>>> the
>>>> CMPXCHG8B instruction.
>>>
>>> As Anton pointed out, the use of cmpxchg8b will not solve the problem of
>>> another thread executing within the sequence of instructions that are
>>> overwritten by the jump instruction. Alternative tricks would be needed
>>> to ensure that this does not happen. This is likely one of the reasons
>>> why Microsoft uses a two-byte no-op (mov edi, edi) rather than two
>>> one-byte no-op (nop / nop) instructions in their binaries that are
>>> compiled to support hotpatching.
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> For our schedule of WDF, WDM, debugging and other seminars visit:
>>> http://www.osr.com/seminars
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at
>>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer