We are writing a WDM driver for a streaming PCI device. We need a way to
copy 1284 bytes fast . Is there any function other than RtlCopyMemory
available for that?
RtlCopyMemory is heavily optimized. I doubt you can do better. What about doing DMA instead? If DMA is not possible,
how are you mapping the source and destination regions and what kind of memory are they?
regards
– Tom Stonecypher
xxxxx@iStreamConsulting.com
www.iStreamConsulting.com
+1-803-463-6340
“Champak” wrote in message news:xxxxx@ntdev…
>
> We are writing a WDM driver for a streaming PCI device. We need a way to
> copy 1284 bytes fast . Is there any function other than RtlCopyMemory
> available for that?
>
>
Not that I know of. If you look at RtlCopyMemory you will find it is a
wrapper for memcopy which does 32 bit moves whenever possible.
–
Gary G. Little
xxxxx@broadstor.com
xxxxx@inland.net
“Champak” wrote in message news:xxxxx@ntdev…
>
> We are writing a WDM driver for a streaming PCI device. We need a way to
> copy 1284 bytes fast . Is there any function other than RtlCopyMemory
> available for that?
>
>
Actually, if you know you are always going to do 1284-byte moves, then it is
probably possible to optimize a private version of RtlCopyMemory for this
case. My guess is however that the time saved will be nearly negligible.
-----Original Message-----
From: Gary G. Little [mailto:xxxxx@broadstor.com]
Sent: Tuesday, July 02, 2002 10:35 AM
To: NT Developers Interest List
Subject: [ntdev] Re: Fast RtlCopyMemoryNot that I know of. If you look at RtlCopyMemory you will
find it is a wrapper for memcopy which does 32 bit moves
whenever possible.–
Gary G. Little
xxxxx@broadstor.com
xxxxx@inland.net“Champak” wrote in message
news:xxxxx@ntdev…
>
> We are writing a WDM driver for a streaming PCI device. We need a way
> to copy 1284 bytes fast . Is there any function other than
> RtlCopyMemory available for that?
>
>
—
You are currently subscribed to ntdev as: xxxxx@stratus.com To
unsubscribe send a blank email to %%email.unsub%%
----- Original Message -----
From: “Roddy, Mark”
To: “NT Developers Interest List”
Sent: Tuesday, July 02, 2002 11:55 AM
Subject: [ntdev] Re: Fast RtlCopyMemory
> Actually, if you know you are always going to do 1284-byte moves, then it
is
> probably possible to optimize a private version of RtlCopyMemory for this
> case. My guess is however that the time saved will be nearly negligible.
>
you stand to save about 5 clock cycles at most the standard instrinsic
memcpy executes a “rep movsd” for however many 32-bit dwords there are…
then if there are more bytes left (3 at most) it will execute a “rep movsb”
to move them individually… since your data is a multiple-of-four in length
you could save those last 3 instructions for the single byte moves… you
may find that memory alignment of your buffers makes a much bigger
performance impact… for example, if your destination buffer was at address
xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would take
two cycles because the cpu has to do two aligned fetches to retrieve a
single unaligned dword… unless you have a packed structure that
unnaturally forces your data to this kind of an offset, the compiler will
align it for you… beyond that, you may see a slight improvement by going
with a larger alignment multiple like 64 or 128 bytes as this may be more
cache line friendly…
also if the source and destination buffers have the same alignment they will
collide with each other in cache… for example
with a declaration like:
int arr1[1024];
int arr2[1024];
it will take much longer for you to copy all of arr1 to arr2 than
int arr1[1025];
int arr2[1025];
this is because of the low order bits of the addresses will be the same for
arr1[i] and arr2[i] in the first example, but different in the second… the
bits being the same causes hash collisions in the cache and hurts
performance…
Thanks,
Jeff Bromberger
Induction Industries, Inc.
www.inductionindustries.com
Any chance that looking at the design would be better than just optimizing
one simple memory copy? How about reading directly into the final buffer?
Any data movement you can avoid has to be faster than just improving the
speed of the move.
----- Original Message -----
From: “Roddy, Mark”
To: “NT Developers Interest List”
Sent: Tuesday, July 02, 2002 12:55 PM
Subject: [ntdev] Re: Fast RtlCopyMemory
> Actually, if you know you are always going to do 1284-byte moves, then it
is
> probably possible to optimize a private version of RtlCopyMemory for this
> case. My guess is however that the time saved will be nearly negligible.
>
> > -----Original Message-----
> > From: Gary G. Little [mailto:xxxxx@broadstor.com]
> > Sent: Tuesday, July 02, 2002 10:35 AM
> > To: NT Developers Interest List
> > Subject: [ntdev] Re: Fast RtlCopyMemory
> >
> >
> > Not that I know of. If you look at RtlCopyMemory you will
> > find it is a wrapper for memcopy which does 32 bit moves
> > whenever possible.
> >
> > –
> > Gary G. Little
> > xxxxx@broadstor.com
> > xxxxx@inland.net
> >
> > “Champak” wrote in message
> news:xxxxx@ntdev…
> >
> > We are writing a WDM driver for a streaming PCI device. We need a way
> > to copy 1284 bytes fast . Is there any function other than
> > RtlCopyMemory available for that?
> >
> >
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@stratus.com To
> unsubscribe send a blank email to %%email.unsub%%
>
> —
> You are currently subscribed to ntdev as: xxxxx@yoshimuni.com
> To unsubscribe send a blank email to %%email.unsub%%
>
I thought that a loop using the SIMD instructions is faster ? Intel moves 16
bytes at a time. Also, depending on how your program is structured, you can
take advantage of the prefetch instruction to avoid the cache misses.
Alberto.
-----Original Message-----
From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
Sent: Tuesday, July 02, 2002 1:57 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Fast RtlCopyMemory
----- Original Message -----
From: “Roddy, Mark”
To: “NT Developers Interest List”
Sent: Tuesday, July 02, 2002 11:55 AM
Subject: [ntdev] Re: Fast RtlCopyMemory
> Actually, if you know you are always going to do 1284-byte moves, then it
is
> probably possible to optimize a private version of RtlCopyMemory for this
> case. My guess is however that the time saved will be nearly negligible.
>
you stand to save about 5 clock cycles at most the standard instrinsic
memcpy executes a “rep movsd” for however many 32-bit dwords there are…
then if there are more bytes left (3 at most) it will execute a “rep movsb”
to move them individually… since your data is a multiple-of-four in length
you could save those last 3 instructions for the single byte moves… you
may find that memory alignment of your buffers makes a much bigger
performance impact… for example, if your destination buffer was at address
xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would take
two cycles because the cpu has to do two aligned fetches to retrieve a
single unaligned dword… unless you have a packed structure that
unnaturally forces your data to this kind of an offset, the compiler will
align it for you… beyond that, you may see a slight improvement by going
with a larger alignment multiple like 64 or 128 bytes as this may be more
cache line friendly…
also if the source and destination buffers have the same alignment they will
collide with each other in cache… for example
with a declaration like:
int arr1[1024];
int arr2[1024];
it will take much longer for you to copy all of arr1 to arr2 than
int arr1[1025];
int arr2[1025];
this is because of the low order bits of the addresses will be the same for
arr1[i] and arr2[i] in the first example, but different in the second… the
bits being the same causes hash collisions in the cache and hurts
performance…
Thanks,
Jeff Bromberger
Induction Industries, Inc.
www.inductionindustries.com
—
You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to %%email.unsub%%
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.
Yeah, but then don’t you have to go off and save the FP state? At least
with the original
MMX instruction set.
“Moreira, Alberto” wrote in message
news:xxxxx@ntdev…
>
> I thought that a loop using the SIMD instructions is faster ? Intel moves
16
> bytes at a time. Also, depending on how your program is structured, you
can
> take advantage of the prefetch instruction to avoid the cache misses.
>
> Alberto.
>
> -----Original Message-----
> From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
> Sent: Tuesday, July 02, 2002 1:57 PM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> ----- Original Message -----
> From: “Roddy, Mark”
> To: “NT Developers Interest List”
> Sent: Tuesday, July 02, 2002 11:55 AM
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> > Actually, if you know you are always going to do 1284-byte moves, then
it
> is
> > probably possible to optimize a private version of RtlCopyMemory for
this
> > case. My guess is however that the time saved will be nearly negligible.
> >
>
>
> you stand to save about 5 clock cycles at most the standard
instrinsic
> memcpy executes a “rep movsd” for however many 32-bit dwords there are…
> then if there are more bytes left (3 at most) it will execute a “rep
movsb”
> to move them individually… since your data is a multiple-of-four in
length
> you could save those last 3 instructions for the single byte moves… you
> may find that memory alignment of your buffers makes a much bigger
> performance impact… for example, if your destination buffer was at
address
> xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would
take
> two cycles because the cpu has to do two aligned fetches to retrieve a
> single unaligned dword… unless you have a packed structure that
> unnaturally forces your data to this kind of an offset, the compiler will
> align it for you… beyond that, you may see a slight improvement by going
> with a larger alignment multiple like 64 or 128 bytes as this may be more
> cache line friendly…
>
> also if the source and destination buffers have the same alignment they
will
> collide with each other in cache… for example
>
> with a declaration like:
>
> int arr1[1024];
> int arr2[1024];
>
> it will take much longer for you to copy all of arr1 to arr2 than
>
> int arr1[1025];
> int arr2[1025];
>
> this is because of the low order bits of the addresses will be the same
for
> arr1[i] and arr2[i] in the first example, but different in the second…
the
> bits being the same causes hash collisions in the cache and hurts
> performance…
>
> Thanks,
> Jeff Bromberger
> Induction Industries, Inc.
> www.inductionindustries.com
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@compuware.com
> To unsubscribe send a blank email to %%email.unsub%%
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
disclose
> it to anyone else. If you received it in error please notify us
immediately
> and then destroy it.
>
>
>
You may have to save those particular XMM registers you’re using, but
they’re disjoint from the FP registers. It may be worth trying, if it
doesn’t help, well, pity, let’s not use it.
Alberto.
-----Original Message-----
From: Scott Neugroschl [mailto:xxxxx@yahoo.com]
Sent: Tuesday, July 02, 2002 4:38 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Fast RtlCopyMemory
Yeah, but then don’t you have to go off and save the FP state? At least
with the original
MMX instruction set.
“Moreira, Alberto” wrote in message
news:xxxxx@ntdev…
>
> I thought that a loop using the SIMD instructions is faster ? Intel moves
16
> bytes at a time. Also, depending on how your program is structured, you
can
> take advantage of the prefetch instruction to avoid the cache misses.
>
> Alberto.
>
> -----Original Message-----
> From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
> Sent: Tuesday, July 02, 2002 1:57 PM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> ----- Original Message -----
> From: “Roddy, Mark”
> To: “NT Developers Interest List”
> Sent: Tuesday, July 02, 2002 11:55 AM
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> > Actually, if you know you are always going to do 1284-byte moves, then
it
> is
> > probably possible to optimize a private version of RtlCopyMemory for
this
> > case. My guess is however that the time saved will be nearly negligible.
> >
>
>
> you stand to save about 5 clock cycles at most the standard
instrinsic
> memcpy executes a “rep movsd” for however many 32-bit dwords there are…
> then if there are more bytes left (3 at most) it will execute a “rep
movsb”
> to move them individually… since your data is a multiple-of-four in
length
> you could save those last 3 instructions for the single byte moves… you
> may find that memory alignment of your buffers makes a much bigger
> performance impact… for example, if your destination buffer was at
address
> xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would
take
> two cycles because the cpu has to do two aligned fetches to retrieve a
> single unaligned dword… unless you have a packed structure that
> unnaturally forces your data to this kind of an offset, the compiler will
> align it for you… beyond that, you may see a slight improvement by going
> with a larger alignment multiple like 64 or 128 bytes as this may be more
> cache line friendly…
>
> also if the source and destination buffers have the same alignment they
will
> collide with each other in cache… for example
>
> with a declaration like:
>
> int arr1[1024];
> int arr2[1024];
>
> it will take much longer for you to copy all of arr1 to arr2 than
>
> int arr1[1025];
> int arr2[1025];
>
> this is because of the low order bits of the addresses will be the same
for
> arr1[i] and arr2[i] in the first example, but different in the second…
the
> bits being the same causes hash collisions in the cache and hurts
> performance…
>
> Thanks,
> Jeff Bromberger
> Induction Industries, Inc.
> www.inductionindustries.com
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@compuware.com
> To unsubscribe send a blank email to %%email.unsub%%
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
disclose
> it to anyone else. If you received it in error please notify us
immediately
> and then destroy it.
>
>
>
—
You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to %%email.unsub%%
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.
I guess it all depends on how carried away you want to get… I usually try
to keep things cpu-independant unless the performance gains are huge… the
prefetch instructions would require at least a P3 class machine… then of
course there’s the joy of inlining raw opcodes because you don’t have
compiler support, etc… and in a case like this, you may cancel out the
gains of using MMX or similar instructions by the overhead of runtime
checking your current hardware for the required support… after all, we’re
only talking about copying 1284 bytes… nevertheless, your point is well
taken… there’s always another way to do things…
Thanks,
Jeff Bromberger
Induction Industries, Inc.
www.inductionindustries.com
----- Original Message -----
From: “Moreira, Alberto”
To: “NT Developers Interest List”
Sent: Tuesday, July 02, 2002 2:08 PM
Subject: [ntdev] Re: Fast RtlCopyMemory
> I thought that a loop using the SIMD instructions is faster ? Intel moves
16
> bytes at a time. Also, depending on how your program is structured, you
can
> take advantage of the prefetch instruction to avoid the cache misses.
>
> Alberto.
>
MMX (at least in the Pentium MMX) was mapped on top of the FPU registers, so
that OSen didn’t
need to be rewritten to support MMX in apps. To use MMX you had to save
off the FPU state,
plus whatever registers you used. I don’t know how SSE and SSE2 works, so I
can’t comment on that.
(That’s why I specificaly said MMX).
“Moreira, Alberto” wrote in message
news:xxxxx@ntdev…
>
> You may have to save those particular XMM registers you’re using, but
> they’re disjoint from the FP registers. It may be worth trying, if it
> doesn’t help, well, pity, let’s not use it.
>
> Alberto.
>
>
> -----Original Message-----
> From: Scott Neugroschl [mailto:xxxxx@yahoo.com]
> Sent: Tuesday, July 02, 2002 4:38 PM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> Yeah, but then don’t you have to go off and save the FP state? At least
> with the original
> MMX instruction set.
>
> “Moreira, Alberto” wrote in message
> news:xxxxx@ntdev…
> >
> > I thought that a loop using the SIMD instructions is faster ? Intel
moves
> 16
> > bytes at a time. Also, depending on how your program is structured, you
> can
> > take advantage of the prefetch instruction to avoid the cache misses.
> >
> > Alberto.
> >
> > -----Original Message-----
> > From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
> > Sent: Tuesday, July 02, 2002 1:57 PM
> > To: NT Developers Interest List
> > Subject: [ntdev] Re: Fast RtlCopyMemory
> >
> >
> > ----- Original Message -----
> > From: “Roddy, Mark”
> > To: “NT Developers Interest List”
> > Sent: Tuesday, July 02, 2002 11:55 AM
> > Subject: [ntdev] Re: Fast RtlCopyMemory
> >
> >
> > > Actually, if you know you are always going to do 1284-byte moves, then
> it
> > is
> > > probably possible to optimize a private version of RtlCopyMemory for
> this
> > > case. My guess is however that the time saved will be nearly
negligible.
> > >
> >
> >
> > you stand to save about 5 clock cycles at most the standard
> instrinsic
> > memcpy executes a “rep movsd” for however many 32-bit dwords there
are…
> > then if there are more bytes left (3 at most) it will execute a “rep
> movsb”
> > to move them individually… since your data is a multiple-of-four in
> length
> > you could save those last 3 instructions for the single byte moves…
you
> > may find that memory alignment of your buffers makes a much bigger
> > performance impact… for example, if your destination buffer was at
> address
> > xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would
> take
> > two cycles because the cpu has to do two aligned fetches to retrieve a
> > single unaligned dword… unless you have a packed structure that
> > unnaturally forces your data to this kind of an offset, the compiler
will
> > align it for you… beyond that, you may see a slight improvement by
going
> > with a larger alignment multiple like 64 or 128 bytes as this may be
more
> > cache line friendly…
> >
> > also if the source and destination buffers have the same alignment they
> will
> > collide with each other in cache… for example
> >
> > with a declaration like:
> >
> > int arr1[1024];
> > int arr2[1024];
> >
> > it will take much longer for you to copy all of arr1 to arr2 than
> >
> > int arr1[1025];
> > int arr2[1025];
> >
> > this is because of the low order bits of the addresses will be the same
> for
> > arr1[i] and arr2[i] in the first example, but different in the second…
> the
> > bits being the same causes hash collisions in the cache and hurts
> > performance…
> >
> > Thanks,
> > Jeff Bromberger
> > Induction Industries, Inc.
> > www.inductionindustries.com
> >
> >
> >
> >
> >
> > —
> > You are currently subscribed to ntdev as: xxxxx@compuware.com
> > To unsubscribe send a blank email to %%email.unsub%%
> >
> >
> >
> > The contents of this e-mail are intended for the named addressee only.
It
> > contains information that may be confidential. Unless you are the named
> > addressee or an authorized designee, you may not copy or use it, or
> disclose
> > it to anyone else. If you received it in error please notify us
> immediately
> > and then destroy it.
> >
> >
> >
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@compuware.com
> To unsubscribe send a blank email to %%email.unsub%%
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
disclose
> it to anyone else. If you received it in error please notify us
immediately
> and then destroy it.
>
>
>
“Moreira, Alberto” wrote in message news:xxxxx@ntdev…
>
> I thought that a loop using the SIMD instructions is faster ? Intel moves 16
> bytes at a time. Also, depending on how your program is structured, you can
> take advantage of the prefetch instruction to avoid the cache misses.
Note that prefetches on sequentially access tend to be more useful on a P3,
the P4 seems to do a pretty good job of predicting these.
(If you are missing in cache, then the rest the discussion about instructions x vs. y
is pointless, since the memory accesses will bound the copy. On P3s at least you
can get big wins by concentrating on the memory accesses.)
-DH
>
> Alberto.
>
> -----Original Message-----
> From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
> Sent: Tuesday, July 02, 2002 1:57 PM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> ----- Original Message -----
> From: “Roddy, Mark”
> To: “NT Developers Interest List”
> Sent: Tuesday, July 02, 2002 11:55 AM
> Subject: [ntdev] Re: Fast RtlCopyMemory
>
>
> > Actually, if you know you are always going to do 1284-byte moves, then it
> is
> > probably possible to optimize a private version of RtlCopyMemory for this
> > case. My guess is however that the time saved will be nearly negligible.
> >
>
>
> you stand to save about 5 clock cycles at most the standard instrinsic
> memcpy executes a “rep movsd” for however many 32-bit dwords there are…
> then if there are more bytes left (3 at most) it will execute a “rep movsb”
> to move them individually… since your data is a multiple-of-four in length
> you could save those last 3 instructions for the single byte moves… you
> may find that memory alignment of your buffers makes a much bigger
> performance impact… for example, if your destination buffer was at address
> xxxxxxx1 (vs. xxxxxxx0 or xxxxxxx4 for example) each 32-bit copy would take
> two cycles because the cpu has to do two aligned fetches to retrieve a
> single unaligned dword… unless you have a packed structure that
> unnaturally forces your data to this kind of an offset, the compiler will
> align it for you… beyond that, you may see a slight improvement by going
> with a larger alignment multiple like 64 or 128 bytes as this may be more
> cache line friendly…
>
> also if the source and destination buffers have the same alignment they will
> collide with each other in cache… for example
>
> with a declaration like:
>
> int arr1[1024];
> int arr2[1024];
>
> it will take much longer for you to copy all of arr1 to arr2 than
>
> int arr1[1025];
> int arr2[1025];
>
> this is because of the low order bits of the addresses will be the same for
> arr1[i] and arr2[i] in the first example, but different in the second… the
> bits being the same causes hash collisions in the cache and hurts
> performance…
>
> Thanks,
> Jeff Bromberger
> Induction Industries, Inc.
> www.inductionindustries.com
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@compuware.com
> To unsubscribe send a blank email to %%email.unsub%%
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or disclose
> it to anyone else. If you received it in error please notify us immediately
> and then destroy it.
>
>
>
Well, the Intel C++ compiler does a great job with its SIMD intrinsics.
Alberto.
-----Original Message-----
From: Jeff Bromberger [mailto:xxxxx@inductionindustries.com]
Sent: Tuesday, July 02, 2002 5:35 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Fast RtlCopyMemory
I guess it all depends on how carried away you want to get… I usually try
to keep things cpu-independant unless the performance gains are huge… the
prefetch instructions would require at least a P3 class machine… then of
course there’s the joy of inlining raw opcodes because you don’t have
compiler support, etc… and in a case like this, you may cancel out the
gains of using MMX or similar instructions by the overhead of runtime
checking your current hardware for the required support… after all, we’re
only talking about copying 1284 bytes… nevertheless, your point is well
taken… there’s always another way to do things…
Thanks,
Jeff Bromberger
Induction Industries, Inc.
www.inductionindustries.com
----- Original Message -----
From: “Moreira, Alberto”
To: “NT Developers Interest List”
Sent: Tuesday, July 02, 2002 2:08 PM
Subject: [ntdev] Re: Fast RtlCopyMemory
> I thought that a loop using the SIMD instructions is faster ? Intel moves
16
> bytes at a time. Also, depending on how your program is structured, you
can
> take advantage of the prefetch instruction to avoid the cache misses.
>
> Alberto.
>
—
You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to %%email.unsub%%
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.