Windows WHER error when DMAing data to PCIe device

I am getting Bug Checks when my user application sends data to my driver. Digging through the dump that is generating points to an error being reported in the root complex which shows a malformed TLP if I’m reading this right.

===============================================================================
Section 1 : PCI Express

Descriptor @ fffffa80362710f0
Section @ fffffa8036271208
Offset : 480
Length : 208
Flags : 0x00000001 Primary
Severity : Fatal

Port Type : Root Port
Version : 1.16
Command/Status: 0x0546/0x4010
Device Id :
VenId:DevId : 8086:2f04
Class code : 060400
Function No : 0x00
Device No : 0x02
Segment : 0x0000
Primary Bus : 0x80
Second. Bus : 0x83
Slot : 0x0007
Sec. Status : 0x6000
Bridge Ctl. : 0x0003
Express Capability Information @ fffffa803627123c
Device Caps : 00008001 Role-Based Error Reporting: 1
Device Ctl : 0024 ur FE nf ce
Dev Status : 0004 ur FE nf ce
Root Ctl : 000e FS NFS cs

AER Information @ fffffa8036271278
Uncorrectable Error Status : 00040000 ur ecrc MTLP rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Mask : 00318000 UR ecrc mtlp rof UC CA cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00067030 ur ecrc MTLP ROF uc ca CTO FCP PTLP SD DLP und
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 000031c1 ADV RTTO RNRO DLLP TLP RE
Caps & Control : 00000012 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 00000020 830010ff 799fdfc0 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 0000007c MSG# 00 FER NFER FUF MUR UR mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 80,02,00

I am using Direct I/O for my IOCTL.

So, two questions.

  1. Am I correctly setting up my DMA transaction?

VOID MailboxSend(In const PCIP_DATA cipData,
In WDFREQUEST request)
{
NTSTATUS status;
WDFDMATRANSACTION dmaTransaction;
WDF_REQUEST_PARAMETERS params;
PMDL mdl;
PVOID virtualAddress;
ULONG length;

WDF_REQUEST_PARAMETERS_INIT(&params);
WdfRequestGetParameters(request, &params);

status = WdfDmaTransactionCreate(cipData->wdfMailboxDmaEnabler, WDF_NO_OBJECT_ATTRIBUTES, &dmaTransaction);
if(!NT_SUCCESS(status))
{
TraceEvents(TRACE_LEVEL_ERROR, DBG_WRITE, “%s ***ERROR*** WdfDmaTransactionCreate failed %x\n”, LOG_TAG, status);
WdfRequestComplete(request, status);
return;
}

status = WdfRequestRetrieveInputWdmMdl(request, &mdl);
if(!NT_SUCCESS(status))
{
TraceEvents(TRACE_LEVEL_ERROR, DBG_WRITE, “%s ***ERROR*** WdfRequestRetrieveInputWdmMdl failed %x\n”, LOG_TAG, status);
WdfRequestComplete(request, status);
return;
}

virtualAddress = MmGetMdlVirtualAddress(mdl);
length = MmGetMdlByteCount(mdl);

if(length == 0)
{
status = STATUS_BUFFER_TOO_SMALL;
TraceEvents(TRACE_LEVEL_ERROR, DBG_WRITE, “%s ***ERROR*** Mdl lenght == 0 bytes failed %x\n”, LOG_TAG, status);
WdfRequestComplete(request, status);
return;
}

status = WdfDmaTransactionInitialize(dmaTransaction,
CIPEvtProgramMailboxDmaFunction,
WdfDmaDirectionWriteToDevice,
mdl,
virtualAddress,
length);
if(!NT_SUCCESS(status))
{
TraceEvents(TRACE_LEVEL_ERROR, DBG_WRITE, “%s ***ERROR*** WdfDmaTransactionInitialize failed %x\n”, LOG_TAG, status);
WdfObjectDelete(dmaTransaction);
WdfRequestComplete(request, status);
return;
}

status = WdfDmaTransactionExecute(dmaTransaction, dmaTransaction);
if(!NT_SUCCESS(status))
{
TraceEvents(TRACE_LEVEL_ERROR, DBG_WRITE, “%s ***ERROR*** WdfDmaTransactionExecute failed %x\n”, LOG_TAG, status);
WdfObjectDelete(dmaTransaction);
WdfRequestComplete(request, status);
return;
}

WdfRequestComplete(request, status);
}

  1. I am supposed to lock down memory before passing it to my driver, right?

In my user application, I lock down a full page.

SYSTEM_INFO info;
GetSystemInfo(&info);
auto pageSize = info.dwPageSize;

auto lockedMemoryAddr = VirtualAlloc(NULL, pageSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if(lockedMemoryAddr == nullptr)
{
std::cerr << "VirtualAlloc failed. Error: " << GetLastError() << std::endl;
return -1;
}

In turn, I pass that address to DeviceIoControl and send only 8 bytes of the full 4K that I allocated.

Is there anything I’m doing wrong? I know there are several different reasons for why an MTLP error would be generated but I’m first trying to pick at the more obvious reasons for why I’m crashing the system: me.

xxxxx@gmail.com wrote:

I am getting Bug Checks when my user application sends data to my driver. Digging through the dump that is generating points to an error being reported in the root complex which shows a malformed TLP if I’m reading this right.

A malformed TLP is a hardware error in your PCIExpress IP. There’s
nothing you can do from a driver to trigger this.

  1. I am supposed to lock down memory before passing it to my driver, right?

Absolutely not. The I/O system will do this for you.

You said you are using Direct I/O, but you are fetching the input
buffer. The input buffer in Direct I/O requests is always buffered, so
the memory you get is a kernel-mode (locked) copy of the user’s buffer.
So, you’re not even using the memory you locked down.

In turn, I pass that address to DeviceIoControl and send only 8 bytes of the full 4K that I allocated.

Is there anything I’m doing wrong? I know there are several different reasons for why an MTLP error would be generated but I’m first trying to pick at the more obvious reasons for why I’m crashing the system: me.

Are you quite sure that your PCIExpress IP will handle transfers as
small as 8 bytes? Some IP blocks have minimum lengths and severe
alignment requirements.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks for clearing that up for me.

Are you quite sure that your PCIExpress IP will handle transfers as
small as 8 bytes?

Yeah, I see the messages I’m sending on the application that is running on
the PCIe board and this board has been used in this way in Linux and
Solaris OS’s environments.

So pretty much my only option left is to hook up an analyzer and see what
is going on, right?

On Thu, Aug 18, 2016 at 5:48 PM, Tim Roberts wrote:

> xxxxx@gmail.com wrote:
> > I am getting Bug Checks when my user application sends data to my
> driver. Digging through the dump that is generating points to an error
> being reported in the root complex which shows a malformed TLP if I’m
> reading this right.
>
> A malformed TLP is a hardware error in your PCIExpress IP. There’s
> nothing you can do from a driver to trigger this.
>
>
> > 2) I am supposed to lock down memory before passing it to my driver,
> right?
>
> Absolutely not. The I/O system will do this for you.
>
> You said you are using Direct I/O, but you are fetching the input
> buffer. The input buffer in Direct I/O requests is always buffered, so
> the memory you get is a kernel-mode (locked) copy of the user’s buffer.
> So, you’re not even using the memory you locked down.
>
>
> > In turn, I pass that address to DeviceIoControl and send only 8 bytes of
> the full 4K that I allocated.
> >
> > Is there anything I’m doing wrong? I know there are several different
> reasons for why an MTLP error would be generated but I’m first trying to
> pick at the more obvious reasons for why I’m crashing the system: me.
>
> Are you quite sure that your PCIExpress IP will handle transfers as
> small as 8 bytes? Some IP blocks have minimum lengths and severe
> alignment requirements.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:></http:>

Daniel Kulas wrote:

Thanks for clearing that up for me.

Are you quite sure that your PCIExpress IP will handle transfers as
small as 8 bytes?

Yeah, I see the messages I’m sending on the application that is
running on the PCIe board and this board has been used in this way in
Linux and Solaris OS’s environments.

So pretty much my only option left is to hook up an analyzer and see
what is going on, right?

If you are lucky enough to have a PCIe analyzer, that’s certainly the
quickest method.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.