How to Read/Write 64-Bit Registers using 32-Bit Windows XP

I am running Windows XP (32-bit) and have multiple 64-bit PCI devices in the system. In the past I have used READ_REGISTER_ULONG to read 32-bits at a time from PCI devices. However, since these devices are 64-bit, I need to read or write the full 64-bits at a time. I see that the HAL library also has a function called READ_REGISTER_ULONG64(), but apparently it’s only available in 64-bit versions of Windows.

I know 64-bit PCI devices have been around for a LONG time, while 64-bit Windows operating systems have only been around for a handful of years. So in the past how would a person read/write the full 64-bit data path to/from a PCI device? Maybe I’m missing something obvious since I’m new to driver development. Also, to be clear I’m not talking about 64-bit addresses; I only use 32-bit addressing, just need to do full 64-bit data transfers.

Thanks!!
Ryan

Ryan Patterson wrote:

I am running Windows XP (32-bit) and have multiple 64-bit PCI devices in the system. In the past I have used READ_REGISTER_ULONG to read 32-bits at a time from PCI devices. However, since these devices are 64-bit, I need to read or write the full 64-bits at a time. I see that the HAL library also has a function called READ_REGISTER_ULONG64(), but apparently it’s only available in 64-bit versions of Windows.

I know 64-bit PCI devices have been around for a LONG time, while 64-bit Windows operating systems have only been around for a handful of years. So in the past how would a person read/write the full 64-bit data path to/from a PCI device? Maybe I’m missing something obvious since I’m new to driver development. Also, to be clear I’m not talking about 64-bit addresses; I only use 32-bit addressing, just need to do full 64-bit data transfers.

64-bit PCI means 64-bit addressing, not 64-bit data. There is no
instruction in the standard x86 instruction set that does an atomic
64-bit read, in large part because there are no 64-bit registers in
which to put the data. If your hardware requires atomic access to
64-bit wide registers, I would call that a design flaw.

You can try rolling your own using the MMX moveq instruction, but I
think you have a problem.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

and

Wow… that’s seriously confusing. No disrespect intended, but do you really understand what you’re talking about here? Or, like, are you a new driver writer and some dude handed you a board and told you to write the driver for it? I ask because you’ve left out lots of what I would consider to be the vital information.

A couple of questions:

  1. Your device has 64-bit wide REGISTERS? They’re really 64-bits? In the hardware manual? And that’s the only way they’re addressed? That’s probably not a great design. In systems where there are 64-bit wide values, the registers are usually divided into 2, 32-bit registers. I’ve, personally, never seen a device with a 64-bit wide register. Are you SURE this is what you have?

  2. This is a bus master DMA device we’re talking about?

  3. Why would you only use 32-bit addressing? That’s certain to be a mistake, if your device is capable of 64-bit addressing.

  4. What do you mean by “I just need to do 64-bit data transfers”? 64-bit DMA transfers? I mean… that’s part of the bus protocol… how would you know how wide your transfers are?

Help us help you,

Peter
OSR

Ryan wrote - However, since these devices are 64-bit, I need to read or write the full 64-bits at a time.

Are you positive of this statement by having tested it? Or is that an assumption you are making because it seems logical?

I access 64 bit registers on 64bit devices from 32bit OS all the time, but I write or read the low 32 bits, then the high 32 bits as if they were 2 different registers. If you haven’t tried it that way, you should.

On the other hand, I have had hardware guys design registers that way in the past and I had to make them go back and fix it. To a hardware guy it just seems logical to require 64bit access on a 64bit bus if they’ve never had to deal with it before.

Clay

All, Thanks for the replies. It’s clear that I didn’t give enough information, so I’ll try to fill in the details.

  • My system is a compact PCI chassis where there is single board computer running Windows XP (32-bit version), and then there are three peripheral cards connected to the single board via the compact PCI back plane.
  • My peripheral cards, single board, and back plane all support a 33 MHz PCI bus at 64 bits wide, just like any standard (albeit not very common) 64-bit PCI device.
  • The peripherals only have 64 MB of memory space, so I don’t need to do 64-bit addressing (interesting fact: even on a 64-bit wide bus, PCI uses dual address cycles for 64-bit addressing to maintain backwards compatibility, so it would still cost me an extra cycleto needlessly do 64-bit addressing).
  • The reason PCI devices with 64-bit busses have such wide busses is for bandwidth - it has nothing to do with addressing. So if I’m only doing 32-bit reads or writes, I’m only utilizing half the data bus. For bandwidth reasons, I need to utilize the full 64-bit wide data path. Obviously this will be via DMA to achieve full bandwidth. However, for testing purposes (we’re designing the peripheral devices using FPGAs), I want to be able to do full width reads and writes.

So really my question boils down to two separate questions, although I really only asked the second one to begin with:

  1. When using DMA to read / write large amounts, will Windows utilize the full 64-bit data bus, or will it only utilize the lower 32-bits and double the number of transfers (and therefore the time it takes to do the DMA operation)? Is there something I need to do to setup the transfer so that it utilizes all 64 bits?

  2. Is there a way to do individual reads / writes that utilize the full 64-bit path for testing purposes? Right now I’m doing two 32-bit accesses using READ_REGISTER_ULONG() or WRITE_REGISTER_ULONG(), one for the low word then one for the high word, but this doesn’t help our FPGA guy to know if his core will properly handle 64-bit accesses.

Please let me know if there are other things I need to clarify. I did all the hardware design, so that’s my roots - I’m just getting into Windows drivers. Thank you!

I should have also mentioned - a very simplistic but helpful definition at the low level for the type of transactions I’m trying to achieve is located here:

http://en.wikipedia.org/wiki/Conventional_PCI#64-bit_PCI

Hopefully that will give some context about what I’m attempting since I’m not very good at explaining it :slight_smile:

Thanks for that detail. Seriously. Here I’ve been writing Busmaster DMA device drivers for PCI devices for lo these many years, and I NEVER KNEW THAT. In fact, I didn’t believe you, so I went to the PCI spec to look it up and cite the section showing you were wrong. But, of course, you’re correct. So… thanks for that.

I guess that means I have to answer you questions, then, huh? Well, I’ll at least try.

Windows has no role here at all. It’s really between the initiator and the target. There’s nothing that intervenes and attempts to control this. SO, as long as the target asserts ACK64# and the initiator sets the write byte enables, you’re pretty much golden. No Windows involved whatsoever.

Not on 32-bit Windows, no.

If that doesn’t do it for you, feel free to ask more. Again.

Peter
OSR

Peter, thanks for the information - I think you fully answered my questions. It’s good to know that when I set up a DMA transfer the hardware *should* realize my peripheral is 64-bit and use the full bus width.

Bummer that I can’t read/write single 64-bit registers, but I’m not incredibly surprised. I guess I’ll just do REALLY SHORT DMA transfers for testing purposes on those. If I were to move to a 64-bit version of Windows, do you think the READ_REGISTER_ULONG64() and WRITE_REGISTER_ULONG64() function would do a single 64-bit transfer rather than two 32-bit transfers if my hardware supports it?

Thanks!
Ryan

>achieve full bandwidth. However, for testing purposes (we’re designing the peripheral devices using

FPGAs), I want to be able to do full width reads and writes.

Please guess what CPU opcode can provide you with the facility of generating a 64bit-wide PCI cycle with CPU as a master.

Probably emit the 64bit op to your 32bit code via inline assembly.

Probably some MMX op.

And yes, DMA is normal for PCI, nearly a must, the data must go via DMA, only the command/status - via registers.

  1. When using DMA to read / write large amounts, will Windows utilize the full 64-bit data bus,

Yes, if the device’s hardware is proper.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Well the code is:

__forceinline
ULONG64
READ_REGISTER_ULONG64 (
__in __drv_nonConstant volatile ULONG64 *Register
)
{
_ReadWriteBarrier();
return *Register;
}

__forceinline
VOID
WRITE_REGISTER_ULONG64 (
__in __drv_nonConstant volatile ULONG64 *Register,
__in ULONG64 Value
)
{

*Register = Value;
FastFence();
return;
}

(gotta LOVE those barrier/fence instructions, eh?)

So I would GUESS you get one instruction… I can’t imagine you don’t… (but I don’t have a test box handy to absolutely verify this via disassembly…)

Alternatively, you could certainly do __movsq or __stosq

Peter
OSR

http://msdn.microsoft.com/en-us/library/ff566395(VS.85).aspx
WRITE_REGISTER_ULONG64 Macro - Available only in 64-bit versions of Windows.

So you can’t use it when building for a 32-bit OS version.

On an amd64 build a driver I have which uses it generated code which looked like this:

000d3 48 89 10 mov QWORD PTR [rax], rdx
000d6 f0 83 0c 24 00 lock or DWORD PTR [rsp], 0

That was nice of you, Glen. Thanks for taking the time to post the disassembly,

Peter
OSR

xxxxx@lmco.com wrote:

So really my question boils down to two separate questions, although I really only asked the second one to begin with:

  1. When using DMA to read / write large amounts, will Windows utilize the full 64-bit data bus, or will it only utilize the lower 32-bits and double the number of transfers (and therefore the time it takes to do the DMA operation)? Is there something I need to do to setup the transfer so that it utilizes all 64 bits?

Windows doesn’t have anything to do with DMA. That’s entirely under the
control of your device.

  1. Is there a way to do individual reads / writes that utilize the full 64-bit path for testing purposes? Right now I’m doing two 32-bit accesses using READ_REGISTER_ULONG() or WRITE_REGISTER_ULONG(), one for the low word then one for the high word, but this doesn’t help our FPGA guy to know if his core will properly handle 64-bit accesses.

The only way to do that is to do a read into a 64-bit register, and that
means you can’t use a general purpose register. As I mentioned, you can
try to use the “moveq” MMX instruction, then move the MMX data into
memory so you can test the contents.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

xxxxx@lmco.com wrote:

Peter, thanks for the information - I think you fully answered my questions. It’s good to know that when I set up a DMA transfer the hardware *should* realize my peripheral is 64-bit and use the full bus width.

I’m confused by this sentence. You refer to “the [DMA] hardware” as if
it were somehow different from “my peripheral”. It’s not. With PCI bus
mastering, it is your peripheral that is doing ALL of the work. If your
peripheral wants to do 64-bit DMA, then your peripheral needs to
generate 64-bit cycles. You are in charge. You are the “bus master”.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

wrote in message news:xxxxx@ntdev…
> Peter, thanks for the information - I think you fully answered my
> questions. It’s good to know that when I set up a DMA transfer the
> hardware should realize my peripheral is 64-bit and use the full bus
> width.
>
> Bummer that I can’t read/write single 64-bit registers, but I’m not
> incredibly surprised. I guess I’ll just do REALLY SHORT DMA transfers for
> testing purposes on those. If I were to move to a 64-bit version of
> Windows, do you think the READ_REGISTER_ULONG64() and
> WRITE_REGISTER_ULONG64() function would do a single 64-bit transfer rather
> than two 32-bit transfers if my hardware supports it?
>
> Thanks!
> Ryan

Sorry, rather than posting so many questions, could you just
get a x64 WinPE “live CD” from your IT folks, and run 64-bit OS right now.
It’s very handy for quick experiments. Yes, you can install a PCI driver on
it,
and kernel debugger can be enabed too.

(Maybe “dq” & “eq” windbg commands on x64 issue 64-bit cycles -
can’t confirm this without a PCI sniffer).

Regards,
– pa