PCIe register caching issue

We have a PCIe device that creates a BAR0 as non-prefetchable as it contains registers that clear on read. the BAR0 is mapped like:
mRegisters = (PUINT8)MmMapIoSpace(paBar, length, MmNonCached);

we use a function, readRegister32(offset) to read from the PCIe “registers” that boils down to:

READ_REGISTER_ULONG((PULONG)&mRegisters[offset]);

I am seeing an issue where the virtual address mRegsters[offset] contents are not the same as the physical address (in this case b4000) as read using !dd. The regs at the physical address + offset are correct and reflect the hardware, but the same read from the corresponding VA are wrong - caching, right?

However, when I examine the virtual address for mRegisters, in this case,

5: kd> !pte 0xffffd000`22aeb000
VA ffffd00022aeb000
PXE at FFFFF6FB7DBEDD00 PPE at FFFFF6FB7DBA0000 PDE at FFFFF6FB740008A8 PTE at FFFFF6E800115758
contains 00000000005A6863 contains 00000000005A5863 contains 0000000117876863 contains 00000000B400097B
pfn 5a6 —DA–KWEV pfn 5a5 —DA–KWEV pfn 117876 —DA–KWEV pfn b4000 -G-DANTKWEV

FYI…
5: kd> !devext 0xffffe000ad2cc9d0
PDO Extension, Bus 0x7, Device 0, Function 0.
DevObj 0xffffe000ad2cc880 Parent FDO DevExt 0xffffe000ad12ccf0
Device State = PciStarted
Vendor ID ;-)), Device ID 0002
Subsystem Vendor ID ;-)), Subsystem ID 0002
Header Type 0, Class Base/Sub 05/80 (Memory Controller/‘Other’)
Programming Interface: 00, Revision: 00, IntPin: 00, RawLine 00
Possible Decodes ((cmd & 7) = 7): BMI
Capabilities: Ptr=40, power msi express
Express capabilities: (BIOS controlled)
Logical Device Power State: D0
Device Wake Level: D3
WaitWakeIrp:
Requirements: Alignment Length Minimum Maximum
BAR0 Mem: 00010000 00010000 0000000000000000 00000000ffffffff
Resources: Start Length
BAR0 Mem: 00000000b4000000 00010000
Interrupt Requirement:
Message Based: Type - Msi, 0x4 messages requested
Interrupt Resource: Type - MSI, 0x4 Messages Granted

I see that the resultant physical page, our BAR0, is non-cached.

Is there another level of caching that needs to be disabled, and if so, how?

Thanks for any help!

Regards,
-wd

If it is noncached memory the read will always reach the target. The
simplest explanation is that your readRegister32 function is not always
doing what you think it is doing.

Mark Roddy

On Fri, Apr 3, 2015 at 10:40 AM, wrote:

> We have a PCIe device that creates a BAR0 as non-prefetchable as it
> contains registers that clear on read. the BAR0 is mapped like:
> mRegisters = (PUINT8)MmMapIoSpace(paBar, length, MmNonCached);
>
> we use a function, readRegister32(offset) to read from the PCIe
> “registers” that boils down to:
>
> READ_REGISTER_ULONG((PULONG)&mRegisters[offset]);
>
> I am seeing an issue where the virtual address mRegsters[offset] contents
> are not the same as the physical address (in this case b4000) as read using
> !dd. The regs at the physical address + offset are correct and reflect the
> hardware, but the same read from the corresponding VA are wrong - caching,
> right?
>
> However, when I examine the virtual address for mRegisters, in this case,
>
> 5: kd> !pte 0xffffd000`22aeb000
> VA ffffd00022aeb000
> PXE at FFFFF6FB7DBEDD00 PPE at FFFFF6FB7DBA0000 PDE at
> FFFFF6FB740008A8 PTE at FFFFF6E800115758
> contains 00000000005A6863 contains 00000000005A5863 contains
> 0000000117876863 contains 00000000B400097B
> pfn 5a6 —DA–KWEV pfn 5a5 —DA–KWEV pfn 117876
> —DA–KWEV pfn b4000 -G-DANTKWEV
>
> FYI…
> 5: kd> !devext 0xffffe000ad2cc9d0
> PDO Extension, Bus 0x7, Device 0, Function 0.
> DevObj 0xffffe000ad2cc880 Parent FDO DevExt 0xffffe000ad12ccf0
> Device State = PciStarted
> Vendor ID ;-)), Device ID 0002
> Subsystem Vendor ID ;-)), Subsystem ID 0002
> Header Type 0, Class Base/Sub 05/80 (Memory Controller/‘Other’)
> Programming Interface: 00, Revision: 00, IntPin: 00, RawLine 00
> Possible Decodes ((cmd & 7) = 7): BMI
> Capabilities: Ptr=40, power msi express
> Express capabilities: (BIOS controlled)
> Logical Device Power State: D0
> Device Wake Level: D3
> WaitWakeIrp:
> Requirements: Alignment Length Minimum Maximum
> BAR0 Mem: 00010000 00010000 0000000000000000 00000000ffffffff
> Resources: Start Length
> BAR0 Mem: 00000000b4000000 00010000
> Interrupt Requirement:
> Message Based: Type - Msi, 0x4 messages requested
> Interrupt Resource: Type - MSI, 0x4 Messages Granted
>
>
> I see that the resultant physical page, our BAR0, is non-cached.
>
> Is there another level of caching that needs to be disabled, and if so,
> how?
>
> Thanks for any help!
>
> Regards,
> -wd
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

UINT32 PcieAudioDevice::readRegister32(UINT32 offset,bool bUseSpinlock)
{
UINT32 res = kPCIeInvalidRead;
if (bUseSpinlock)
{
WdfSpinLockAcquire(mPcieSpinlock);
}
mPcieConsecutiveWrites = 0;
if (mDeviceRemovalDetected)
{
TDL_ENTRY(kTDLFlag_Warning, “readRegister32 skipped (device removed)”, 0, 0, 0, 0);
}
else
{
res = READ_REGISTER_ULONG((PULONG)&mRegisters[offset]);
if (res == kPCIeInvalidRead)
{
TDL_ENTRY(kTDLFlag_Error, “readRegister32 returned 0xffffffff (device removed)”, 0, 0, 0, 0);
mDeviceRemovalDetected = true;
}
TDL_ENTRY(kTDLFlag_Verbose, “readRegister32(0x%04X,%d) = 0x%08X”, offset, bUseSpinlock, res, 0);
}
if (bUseSpinlock)
{
WdfSpinLockRelease(mPcieSpinlock);
}
return res;
}

I assume this is always locked?

Mark Roddy

On Fri, Apr 3, 2015 at 11:35 AM, wrote:

> UINT32 PcieAudioDevice::readRegister32(UINT32 offset,bool bUseSpinlock)
> {
> UINT32 res = kPCIeInvalidRead;
> if (bUseSpinlock)
> {
> WdfSpinLockAcquire(mPcieSpinlock);
> }
> mPcieConsecutiveWrites = 0;
> if (mDeviceRemovalDetected)
> {
> TDL_ENTRY(kTDLFlag_Warning, “readRegister32 skipped (device
> removed)”, 0, 0, 0, 0);
> }
> else
> {
> res = READ_REGISTER_ULONG((PULONG)&mRegisters[offset]);
> if (res == kPCIeInvalidRead)
> {
> TDL_ENTRY(kTDLFlag_Error, “readRegister32 returned 0xffffffff
> (device removed)”, 0, 0, 0, 0);
> mDeviceRemovalDetected = true;
> }
> TDL_ENTRY(kTDLFlag_Verbose, “readRegister32(0x%04X,%d) = 0x%08X”,
> offset, bUseSpinlock, res, 0);
> }
> if (bUseSpinlock)
> {
> WdfSpinLockRelease(mPcieSpinlock);
> }
> return res;
> }
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Hi Mark. Thanks for your responses.

I assume this is always locked?
Yes, because I get a value, just a different one as read from the phys address.

Silly question, but what makes you think you should get the same value when reading twice if it clears on read? Doing a READ_REGISTER_ULONG will clear on read as will a !dd

I’m not saying I should get the same value twice…

  1. Break into the debugger.
  2. Generate the hardware event that should set the bit.
  3. Read from the physical address and note that the bit is set as it should be. Read from the physical address again and note that the bit is no longer set.
  4. Generate the hardware event that should set the bit.
  5. Read from the virtual address and note that the bit is not set.
  6. Immediately read from the physical address and note that the bit is set.
  7. Read from the physical address again and note that the bit clears as it should.

Does that make sense?

Sent from my iPhone

On Apr 3, 2015, at 15:56, xxxxx@hotmail.com wrote:

Silly question, but what makes you think you should get the same value when reading twice if it clears on read? Doing a READ_REGISTER_ULONG will clear on read as will a !dd
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>physical address (in this case b4000)

It should be b4000000, not b4000.

Thanks, Alex, you’re correct. b4000000 + offset (<4k) IS the physical address I’ve been using with the !dd command. I just typed it incorrectly in this post.

okay good… next stupid question. Is this an FGPA PCIe core? Are you sure that your offset is addressing the memory address that you expect? I’ve often seen where people get hung up on BYTE vs WORD vs DWORD addressing.

Hi Shane. Yes, its a Xilinx FPGA PCIe core. The offset I’m using is a byte offset, and I’m using the same one on both VA and PA 32-bit reads. Other reads to adjacent regs (serial number for instance) work fine. All regs are 32bits wide and are only addressed that way.

AFAIK the device is currently NOT supporting 64bit (is a legacy device) so prefetching should not be allowed. I will get a new fpga build soon that has 64bitCapable with prefetch explicitly disabled and give that try. In the meantime, can anyone tell from the !pci output below if prefetch should be disabled for BAR0?
5: kd> !pci 1c0 7 0 0

PCI Configuration Space (Segment:0000 Bus:07 Device:00 Function:00)
Common Header:
00: VendorID ;-))
02: DeviceID 0002
04: Command 0406 MemSpaceEn BusInitiate InterruptDis
06: Status 0010 CapList
08: RevisionID 00
09: ProgIF 00
0a: SubClass 80 Other Memory Controller
0b: BaseClass 05 Memory Controller
0c: CacheLineSize 0020 BurstDisabled
0d: LatencyTimer 00
0e: HeaderType 00
0f: BIST 00
10: BAR0 b4000000
14: BAR1 00000000
18: BAR2 00000000
1c: BAR3 00000000
20: BAR4 00000000
24: BAR5 00000000
28: CBCISPtr 00000000
2c: SubSysVenID ;-))
2e: SubSysID 0002
30: ROMBAR 00000000
34: CapPtr 40
3c: IntLine 00
3d: IntPin 00
3e: MinGnt 00
3f: MaxLat 00
Device Private:
40: 7e034801 00000008 00a55805 feeff00c
50: 00000000 000049b4 00010010 00008fe2
60: 00202810 0003f411 00110000 00000000
70: 00000000 00000000 00000000 00000000
80: 00000000 00000000 00000000 00000000
90: 00000000 00000000 00000000 00000000
a0: 00000000 00000000 00000000 00000000
b0: 00000000 00000000 00000000 00000000
c0: 00000000 00000000 00000000 00000000
d0: 00000000 00000000 00000000 00000000
e0: 00000000 00000000 00000000 00000000
f0: 00000000 00000000 00000000 00000000
Capabilities:
40: CapID 01 PwrMgmt Capability
41: NextPtr 48
42: PwrMgmtCap 7e03 D1Support D2Support PMED0 PMED1 PMED2 PMED3Hot Version=3
44: PwrMgmtCtrl 0008 DataScale:0 DataSel:0 D0

48: CapID 05 MSI Capability
49: NextPtr 58
4a: MsgCtrl 64BitCapable MSIEnable MultipleMsgEnable:2 (0x4) MultipleMsgCapable:2 (0x4)
4c: MsgAddr feeff00c
50: MsgAddrHi 0
54: MsData 49b4

58: CapID 10 PCI Express Capability
59: NextPtr 00
5a: Express Caps 0001 (ver. 1) Type:Endpoint
5c: Device Caps 00008fe2
60: Device Control 2810 MRR:512 NS ap pf et MP:128 RO ur fe nf ce
62: Device Status 0020 TP ap ur fe nf ce
64: Link Caps 0003f411
68: Link Control 0000 es cc rl ld RCB:64 ASPM:None
6a: Link Status 0011 scc lt lte NLW:x1 LS:2.5

Enhanced Capabilities:
100: CapID 0003 Serial Number Capability
Version 1
NextPtr 000

xxxxx@gmail.com wrote:

Hi Shane. Yes, its a Xilinx FPGA PCIe core. The offset I’m using is a byte offset, and I’m using the same one on both VA and PA 32-bit reads. Other reads to adjacent regs (serial number for instance) work fine. All regs are 32bits wide and are only addressed that way.

AFAIK the device is currently NOT supporting 64bit (is a legacy device) so prefetching should not be allowed. I will get a new fpga build soon that has 64bitCapable with prefetch explicitly disabled and give that try. In the meantime, can anyone tell from the !pci output below if prefetch should be disabled for BAR0?

Bit 3 of the BAR says whether it is prefetchable or not. That bit is
clear here, indicating not prefetchable. However, no one does anything
with that bit automatically – it is up to you as the driver to Do The
Right Thing when you map it.

By the way, are you aware that you need to tell the !dd command about
the caching as well? It does its own mapping, and bad things can happen
if a page is mapped in different ways. If you want !dd to do an
uncached read, you need to say
!dd [uc] b4000000


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hi Tim. Thanks for the info about the BARx bit3. I had read about it some
time ago and promptly forgot that the BAR contains info other than just the
base address of the corresponding space.

Bit 3 of the BAR says whether it is prefetchable or not. That bit is

clear here, indicating not prefetchable. However, no one does anything
with that bit automatically – it is up to you as the driver to Do The
Right Thing when you map it.

Isn’t :
mRegisters = (PUINT8)MmMapIoSpace(paBar, length, *MmNonCached*);
the right thing?

By the way, are you aware that you need to tell the !dd command about

the caching as well? It does its own mapping, and bad things can happen
if a page is mapped in different ways. If you want !dd to do an
uncached read, you need to say
!dd [uc] b4000000

Yes, I do know about the [uc] flag, but the read results are correct both
with and without that flag, but thank you for pointing it out.

-wade

On Fri, Apr 3, 2015 at 7:34 PM, Tim Roberts wrote:

> xxxxx@gmail.com wrote:
> > Hi Shane. Yes, its a Xilinx FPGA PCIe core. The offset I’m using is a
> byte offset, and I’m using the same one on both VA and PA 32-bit reads.
> Other reads to adjacent regs (serial number for instance) work fine. All
> regs are 32bits wide and are only addressed that way.
> >
> >
> > AFAIK the device is currently NOT supporting 64bit (is a legacy device)
> so prefetching should not be allowed. I will get a new fpga build soon
> that has 64bitCapable with prefetch explicitly disabled and give that try.
> In the meantime, can anyone tell from the !pci output below if prefetch
> should be disabled for BAR0?
>
> Bit 3 of the BAR says whether it is prefetchable or not. That bit is
> clear here, indicating not prefetchable. However, no one does anything
> with that bit automatically – it is up to you as the driver to Do The
> Right Thing when you map it.
>
> By the way, are you aware that you need to tell the !dd command about
> the caching as well? It does its own mapping, and bad things can happen
> if a page is mapped in different ways. If you want !dd to do an
> uncached read, you need to say
> !dd [uc] b4000000
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Wade Dawson
DT Multimedia

wade dawson wrote:

Hi Tim. Thanks for the info about the BARx bit3. I had read about it
some time ago and promptly forgot that the BAR contains info other
than just the base address of the corresponding space.

Bit 3 of the BAR says whether it is prefetchable or not. That bit is
clear here, indicating not prefetchable. However, no one does
anything
with that bit automatically – it is up to you as the driver to Do The
Right Thing when you map it.

Isn’t :
mRegisters = (PUINT8)MmMapIoSpace(paBar, length, *MmNonCached*);
the right thing?

Yes. Are you using the READ_REGISTER_xxxx calls for every access? If
you ever go direct via the pointer, you’ll need to make sure the pointer
is declared “volatile”.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.