problem about WDDM miniport driver of win7

Probably nothing wrong with non-zero BaseAddress assigning PA as BaseAddress (especially since it is one segment only). We have seen working WDDM drivers doing exactly this…

PS: Why ask if “really allocated from system memory”? OP mentioned “video memory pointed to by PCI BAR”.

Marcel Ruedinger
datronicsoft

pDxgkSegDesc[i].Flags.PopulatedFromSystemMemory = 1;

Mark Roddy

On Mon, Jul 13, 2015 at 10:53 AM, wrote:

> Probably nothing wrong with non-zero BaseAddress assigning PA as
> BaseAddress (especially since it is one segment only). We have seen working
> WDDM drivers doing exactly this…
>
> PS: Why ask if “really allocated from system memory”? OP mentioned “video
> memory pointed to by PCI BAR”.
>
> Marcel Ruedinger
> datronicsoft
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

pDxgkSegDesc[i].Flags.PopulatedFromSystemMemory was already discussed, changed to FALSE and confirmed earlier in this thread (post three and four).

Marcel Ruedinger
datronicsoft

Thank you very much for your reply.

Sure, that’s why I said “hundreds of possible reasons”.
Now it is (n * 100) - 1 possible reasons.

Oh, my god~LOL~

Assuming that m_RamSize is large enough to contain at least two surfaces of your maximum supported resolution (x pixels * y pixels * 4Bps). dxgkrnl wants to create at least one shadow surface and one frame buffer allocation for each active Video Present source (VidPn source == view supported by driver).

  1. First, you need to get your “Video Present Network” code up and work correctly.
    DxgkDdiIsSupportedVidPn
    DxgkDdiEnumVidPnCofuncModality
    How many “VideoPresent Sources” are supported?
    How many “VideoPresent Targets” are supported and what are their IDs?
    Are source modes and target modes added correctly?

*pNumberOfVideoPresentSources = 1;
And I get the “Video Present Network” code from Win8 KMDOD sample codes.

  1. Only after “Video Present Network” works correctly, above named allocation creation needs to be implemented for shadow surface and primary surface.
    DxgkDdiGetStandardAllocationDriverData
    DxgkDdiCreateAllocation
    DxgkDdiOpenAllocation
    DxgkDdiDescribeAllocation

If I set function DxgkDdiQueryAdapterInfo as follow:

pDxgkDrvCap->SchedulingCaps.MultiEngineAware = 1;
pDxgkDrvCap->GpuEngineTopology.NbAsymetricProcessingNodes = 2;
The call sequence will go to “DxgkDdiQueryAdapterInfo/DxgkDdiCommitVidPn” and so on.
else
The call sequence will hang on “DxgkDdiIsSupportedVidPn/DxgkDdiEnumVidPnCofuncModality”.

What is the meaning of “MultiEngineAware/NbAsymetricProcessingNodes”? And the difference?

  1. Finally a DxgkDdiCommitVidPn should be seen. It specifies the selected Video Present Network and the primary allocation.

PS1: (PCI Bar != SystemMemory) => PopulatedFromSystemMemory=FALSE
PS2: Better use Windows 7 AERO OFF (otherwise WDDM User Mode Driver is needed). If a result is expected within a finite amount of time, it needs thorough own research about the above named topics and lots of own experiments exploring WDDM behavior. People here even tend to suggest that only 3 companies on earth have the capability to write WDDM drivers (my opinion differs only slightly - I know a very low two digit number of companies).

OK, I have done it.

Marcel Ruedinger
datronicsoft

Thanks,
lou

Nice approach, taking VidPn code from KMDOD. Could possibly work with only minor changes.

pDxgkDrvCap->SchedulingCaps.MultiEngineAware = 0;
pDxgkDrvCap->GpuEngineTopology.NbAsymetricProcessingNodes = 0;
At this stage, you don’t want to know about these two above, nor deal with them.

Still, I am not seeing any code dealing with allocations (see paragraph 2. above).
Without this, dxgkrnl will not even consider to call DxgkDdiCommitVidPn()

Marcel Ruedinger
datronicsoft

Hi Marcel and Mark, thanks for your help.

Now, both the allocations for shadow surface and primary surface have been created.
But i met a problem again, the screen was black all the time…

The driver’s call sequence is as follow:

DxgkDdiCreateDevice
DxgkDdiCreateContext
DxgkDdiGetStandardAllocationDriverData
DxgkDdiGetStandardAllocationDriverData
case: D3DKMDT_STANDARDALLOCATION_SHADOWSURFACE
DxgkDdiCreateAllocation
DxgkDdiOpenAllocation

DxgkDdiGetStandardAllocationDriverData
DxgkDdiGetStandardAllocationDriverData
case: D3DKMDT_STANDARDALLOCATION_SHAREDPRIMARYSURFACE
DxgkDdiCreateAllocation
DxgkDdiOpenAllocation
DxgkDdiBuildPagingBuffer
case: DXGK_OPERATION_FILL
DxgkDdiPatch
case: pPatch->Flags.Paging == 1
DxgkDdiSubmitCommand
case: pSubmitCommand->DmaBufferPrivateDataSubmissionEndOffset == pSubmitCommand->DmaBufferPrivateDataSubmissionStartOffset
DxgkDdiDescribeAllocation
DxgkDdiCommitVidPn

DxgkDdiSetVidPnSourceAddress
DxgkDdiSetVidPnSourceVisibility
case: pSetVidPnSourceVisibility->Visible if true
DxgkDdiDescribeAllocation
DxgkDdiPresent
case: BLT
DxgkDdiSetPointerPosition
DxgkDdiSetVidPnSourceVisibility
case: pSetVidPnSourceVisibility->Visible is fault
DxgkDdiBuildPagingBuffer
case: DXGK_OPERATION_DISCARD_CONTENT
DxgkDdiCloseAllocation
DxgkDdiDestroyAllocation
DxgkDdiDestroyContext
DxgkDdiDestroyDevice

What may be the problem?
I read the VirtualBox’s code, and I found it should call DxgkDdiBuildPagingBuffer with “DXGK_OPERATION_TRANSFER” instead of “DXGK_OPERATION_DISCARD_CONTENT”. How to make it ?

Thanks.

This already looks very good!
???..

Obviously dxgkrnl is happy in the beginning.
It is even calling DxgkDdiPresent…
Then it suddenly changes its mind.
It becomes very angry and tears everything down…

After DxgkDdiPresent, there should be
DxgkDdiPatch or
DxgkDdiSubmit

My guess: Interrupt handling not implemented?

Does your device have an interrupt?
If yes, where are ISR and DPC implementations?
If no, interrupts and DPCs need to be “faked”.
Fence IDs for completed DMA transfers (including DxgkDdiBuildPagingBuffer)
need to be reported in any case.

Assuming that interrupt handling is not properly implemented,
it might still try to call DxgkDdiQueryCurrentFence.
Is this DDI function at least implemented correctly?
Then it might even recover…

PS: Don’t worry about DXGK_OPERATION_DISCARD_CONTENT
At the time of this call, everything is already screwed anyway.

Marcel Ruedinger
datronicsoft

Hi Marcel, thanks for your suggestion.

The device has an interrupt.
And the following is the implementation of DxgkDdiDpcRoutine :
DxgkDdiDpcRoutine {
m_DxgkInterface.DxgkCbNotifyDpc((HANDLE)m_DxgkInterface.DeviceHandle);
}

The driver only called DxgkDdiDpcRoutine after DxgkDdiSubmit, never called DxgkDdiInterruptRoutine or DxgkDdiQueryCurrentFence.

What may be the problem …

PS:
What is the meaning of shadow surface and primary surface?
What is the data in the DMA buffer, the command or the framebuffer data(bitmap)?
Where can I get the source framebuffer data(bitmap)?

I am a newbie to it …sorry.

Thanks for your help.

Zhegnwei Lou

Study WDK WDDM topic “Video Memory Management and GPU Scheduling”!

The correct sequence is roughly described below:
->DxgkDdiBuildPagingBuffer or DxgkDdiPresent
->DxgkDdiPatch (optional - only if referenced allocation not paged into video memory before)
->DxgkDdiSubmit (receive fence ID)
->DxgkDdiInterruptRoutine (hardware indicates that DMA command processing is finished)
<-DxgkCbNotifyInterrupt (return fence ID of finished DMA command)
<-DxgkCbQueueDpc
->DxgkDdiDpcRoutine
<-DxgkCbNotifyDpc
…then start again at the top->

The first and most obvious problem: No interrupt after DxgkDdiSubmit.
This is required by WDDM GPU scheduling model.

Shadow surface: Windows is drawing to this surface.
After that, Windows will issue a DxgkDdiPresent BLT DMA command to transfer content to the primary surface. This needs to be implemented by the WDDM driver and hardware DMA support (this DMA can be “faked” by memcpy which typically results in low performance).

Primary surface: Allocation containing the framebuffer bitmap.

Dma buffer only contains vendor specific DMA command which is submitted to the GPU upon DxgkDdiSubmit (no screen bitmap data at all).

DxgkDdiSetVidPnSourceAddress informs about the physical address where the primary allocation (framebuffer bitmap) is paged into video memory (within the PCI BAR range in the above case).

Good luck!

Marcel Ruedinger
datronicsoft

Hi Marcel, Thanks for your reply.

Study WDK WDDM topic “Video Memory Management and GPU Scheduling”!
Yes, I need to study it first, I am studying it now.

The correct sequence is roughly described below:
->DxgkDdiBuildPagingBuffer or DxgkDdiPresent
->DxgkDdiPatch (optional - only if referenced allocation not paged into video memory before)
->DxgkDdiSubmit (receive fence ID)
->DxgkDdiInterruptRoutine (hardware indicates that DMA command processing is finished)
<-DxgkCbNotifyInterrupt (return fence ID of finished DMA command)
<-DxgkCbQueueDpc
->DxgkDdiDpcRoutine
<-DxgkCbNotifyDpc
…then start again at the top->

The first and most obvious problem: No interrupt after DxgkDdiSubmit.
This is required by WDDM GPU scheduling model.
sorry, I didn’t say it clearly. My detailed sequence is below:

->DxgkDdiBuildPagingBuffer
case: DXGK_OPERATION_FILL
->DxgkDdiPatch
case: pPatch->Flags.Paging == 1
->DxgkDdiSubmitCommand
case: pSubmitCommand->DmaBufferPrivateDataSubmissionEndOffset == pSubmitCommand->DmaBufferPrivateDataSubmissionStartOffset
<-DxgkCbNotifyInterrupt
<-DxgkCbQueueDpc
->DxgkDdiDpcRoutine
<-DxgkCbSynchronizeExecution
<-DxgkCbNotifyDpc

But the function DxgkDdiInterruptRoutine is never called.
Is it OK?

Shadow surface: Windows is drawing to this surface.
After that, Windows will issue a DxgkDdiPresent BLT DMA command to transfer content to the primary surface. This needs to be implemented by the WDDM driver and hardware DMA support (this DMA can be “faked” by memcpy which typically results in low performance).
I need to think for a while…

Primary surface: Allocation containing the framebuffer bitmap.

Dma buffer only contains vendor specific DMA command which is submitted to the GPU upon DxgkDdiSubmit (no screen bitmap data at all).

DxgkDdiSetVidPnSourceAddress informs about the physical address where the primary allocation (framebuffer bitmap) is paged into video memory (within the PCI BAR range in the above case).
I see now, your answers have been helping me so much.

Thank you very much.

Zhengwei Lou

Official answer: Missing interrupt service routine is NOT OK.
dxgkrnl expects an interrupt after DxgkDdiSubmit.
dxgkrnl calls DxgkDdiControlInterrupt to tell the WDDM driver
when to activate and deactivate hardware interrupts.

Inofficial answer: Interrupt can also be faked by calling DxgkCbNotifyInterrupt
and DxgkCbQueueDpc at the appropriate elevated interrupt level.

Marcel Ruedinger
datronicsoft

Hi Marcel, thanks for your help.

  1. I have implemented the DxgkDdiSubmit as below:
    {


    NotifyInt.InterruptType = DXGK_INTERRUPT_DMA_COMPLETED;
    NotifyInt.DmaCompleted.SubmissionFenceId = pSubmitCommand->SubmissionFenceId;
    NotifyInt.DmaCompleted.NodeOrdinal = pD3DContext->NodeOrdinal;
    pExt->DxgkInterface.DxgkCbNotifyInterrupt(pExt->hDxgkHandle, &NotifyInt);
    pExt->DxgkInterface.DxgkCbQueueDpc(pExt->hDxgkHandle);
    return STATUS_SUCCESS;
    }

But the problem is still the same…

PS: I found something that might be incorrect:

  1. DxgkDdiPresent in the miniport driver:
    pPresent->pAllocationList[DXGK_PRESENT_SOURCE_INDEX].SegmentId is always equal to 0;
    pPresent->pAllocationList[DXGK_PRESENT_DESTINATION_INDEX].SegmentId is always equal to 1;
    What do ‘1 and 0’ represent, are they OK?

  2. In my opinion, I need to implement in the DxgkDdiPresent: copy the source bitmap data( in system memory) to framebuffer (video memory within the PCI BAR range) . But what is the address of the source bitmap ? It’s hidden in some argument? I can not find it…

Any help is wonderful.

Thanks,
Zhengwei lou

  1. I wrote above “…calling DxgkCbNotifyInterrupt and DxgkCbQueueDpc AT THE APPROPRIATE ELEVATED INTERRUPT LEVEL” which is DIRQL. DxgkDdiSubmitCommand only runs on IRQL_DISPATCH_LEVEL. VirtualBox WDDM driver source code clearly shows how to do this using DxgkCbSynchronizeExecution.

2.a. There is nothing wrong with 0 == pPresent->pAllocationList[DXGK_PRESENT_SOURCE_INDEX].SegmentId. This is a standard situation. This means that the source allocation is not available yet. It still has to be paged in by video memory manager before it can be used. It will become available later upon DxgkDdiPatch.

2.b. There is one more thing which I already considered saying at the beginning: I do not remember having ever seen any working WDDM driver defining only one video memory segment as yours does. I am not sure if WDDM allows this…

3.a. WDDM assumes DMA (Direct Memory Access) and NOT copying. Copying the data instead of using DMA probably violates WDDM. Having said this, copying can still be done. Theoretically the appropriate place for copying is DxgkDdiSubmitCommand. The problem: Copying screen content at IRQL_DISPATCH_LEVEL can make the whole system unresponsive. Thus copying can only be done in DxgkDdiPresent or DxgkDdiPatch (in case the allocation is not available at DxgkDdiPresent time - see 2.a. above).

3.b. The source bitmap data is NOT located in system memory. It is in an allocation within one of the previously defined video memory segments.

3.c. Since WDDM assumes DMA, there is only a PHYSICAL address of the source bitmap within its segment. DMA hardware typically does NOT understand VIRTUAL addresses. Thus WDDM does NOT give any VIRTUAL address which could be accessed by a pointer. Up to the driver developer to circumvent the WDDM DMA requirement. Windows kernel mode memory management mechanisms together with segment definitions (e.g. Aperture Segment) can be used to create a virtual mapping for that physical address. This virtual mapping can then be accessed by a pointer.

Good luck!

Marcel Ruedinger
datronicsoft

Hi Marcel, Thanks for your help.

  1. I wrote above “…calling DxgkCbNotifyInterrupt and DxgkCbQueueDpc AT THE APPROPRIATE ELEVATED INTERRUPT LEVEL” which is DIRQL. DxgkDdiSubmitCommand only runs on IRQL_DISPATCH_LEVEL. VirtualBox WDDM driver source code clearly shows how to do this using DxgkCbSynchronizeExecution.

OK, I have modified it. Thanks. I didn’t know about INTERRUPT LEVEL until you told me…HOHO~

2.a. There is nothing wrong with 0 == pPresent->pAllocationList[DXGK_PRESENT_SOURCE_INDEX].SegmentId. This is a standard situation. This means that the source allocation is not available yet. It still has to be paged in by video memory manager before it can be used. It will become available later upon DxgkDdiPatch.
How to page in the source allocation?

2.b. There is one more thing which I already considered saying at the beginning: I do not remember having ever seen any working WDDM driver defining only one video memory segment as yours does. I am not sure if WDDM allows this…

I read VirtualBox WDDM driver source code, And I found it had two segments.

I have modified the code to the following:
if(pDxgkQuerySegOut->pSegmentDescriptor==NULL) {
pDxgkQuerySegOut->NbSegment=2; }

pDxgkSegDesc[0].BaseAddress=m_pHWDevice->m_RamPA;
pDxgkSegDesc[0].CpuTranslatedAddress=m_pHWDevice->m_RamPA;
pDxgkSegDesc[0].Size=0x800000;
pDxgkSegDesc[0].NbOfBnks=0;
pDxgkSegDesc[0].pBankRangeTable=0;
pDxgkSegDesc[0].CommitLimit=pDxgkSegDesc[0].Size;
pDxgkSegDesc[0].Flags.Value=0;
pDxgkSegDesc[0].Flags.CpuVisible=1;

pDxgkSegDesc[1].BaseAddress.QuadPart=0;
pDxgkSegDesc[1].CpuTranslatedAddress.QuadPart=0;
pDxgkSegDesc[1].Size=0x800000;
pDxgkSegDesc[1].NbOfBnks=0;
pDxgkSegDesc[1].pBankRangeTable=0;
pDxgkSegDesc[1].CommitLimit=pDxgkSegDesc[0].Size;
pDxgkSegDesc[1].Flags.Value=0;
pDxgkSegDesc[1].Flags.CpuVisible=0;

One for source allocation, the other for destination allocation, right?

3.a. WDDM assumes DMA (Direct Memory Access) and NOT copying. Copying the data instead of using DMA probably violates WDDM. Having said this, copying can still be done. Theoretically the appropriate place for copying is DxgkDdiSubmitCommand. The problem: Copying screen content at IRQL_DISPATCH_LEVEL can make the whole system unresponsive. Thus copying can only be done in DxgkDdiPresent or DxgkDdiPatch (in case the allocation is not available at DxgkDdiPresent time - see 2.a. above).

3.b. The source bitmap data is NOT located in system memory. It is in an allocation within one of the previously defined video memory segments.

The source bitmap data is located in the member “pPatch-> pAllocationList[DXGK_PRESENT_SOURCE_INDEX].PhysicalAddress” of struct DXGKARG_PATCH in function DxgkDdiPatch, is it?
But after calling DxgkDdiPresent, it doesn’t call DxgkDdiPatch or DxgkDdiSubmitCommand. Instead, it calls as follows:
DxgkDdiSetPointerPosition
DxgkDdiSetVidPnSourceVisibility
case: pSetVidPnSourceVisibility->Visible is false
DxgkDdiBuildPagingBuffer
case: DXGK_OPERATION_DISCARD_CONTENT
DxgkDdiCloseAllocation
DxgkDdiDestroyAllocation
DxgkDdiDestroyContext
DxgkDdiDestroyDevice

what is wrong with it?

Thank you again. Good luck!

Zhengwei lou

Since it is still stopping after DxgkDdiPresent, I assume that its implementation is wrong.
Just a few possible guesses:

  1. Is pDmaBuffer member of DXGKARG_PRESENT used and incremented correctly?

  2. Are Patch Locations correctly managed for AllocationIndex DXGK_PRESENT_SOURCE_INDEX and DXGK_PRESENT_DESTINATION_INDEX?

  3. Is pPatchLocationListOut member of DXGKARG_PRESENT used and incremented correctly?

Marcel Ruedinger
datronicsoft

Hi Marcel, Thank you for your patience and help all the time.

The code of DxgkDdiPresent is as follows:
{
if (pPresent->Flags.Blt)
{
// ------> up to now, it always is in this case.
pPresent->pDmaBufferPrivateData = (uint8_t*)pPresent->pDmaBufferPrivateData + 128;
pPresent->pDmaBuffer = ((uint8_t*)pPresent->pDmaBuffer) + 128;

RtlZeroMemory(pPresent->pPatchLocationListOut, 2*sizeof (D3DDDI_PATCHLOCATIONLIST));
pPresent->pPatchLocationListOut[0].AllocationIndex = DXGK_PRESENT_SOURCE_INDEX;
pPresent->pPatchLocationListOut[0].PatchOffset = 0;
pPresent->pPatchLocationListOut[1].AllocationIndex = DXGK_PRESENT_DESTINATION_INDEX;
pPresent->pPatchLocationListOut[1].PatchOffset = 128;
pPresent->pPatchLocationListOut += 2;
}
return STATUS_SUCCESS;
}

Is it OK?
So far, I have no idea about how to design and use desgpPresent->pDmaBufferPrivateData and pPresent->pDmaBuffer.

Another question:
I want to make the display driver to support the D3D feature, do I have to implement the user-mode driver ?

Thank you very much. Good Luck!

Zhengwei lou

I can only suggest again to study WDK WDDM topic “Video Memory Management and GPU Scheduling”. This is fundamental prerequisite for understanding DMA buffers and Patch Location Lists.

Since this graphics adapter hardware does not seem to support DMA, two things can be ignored for now:

  • Content of DMA buffers
  • Content of Patch Location List

Still, the above have to be initialized, managed and advanced properly:
Incrementing pPresent->pDmaBuffer by hardcoded value is definitely wrong.
pPresent->pDmaBuffer incrementation needs to check for pPresent->DmaSize.
Incrementation of pPresent-pDmaBufferPrivateData is not necessary.
If incrementing pPresent-pDmaBufferPrivateData, must also check pPresent->DmaBufferPrivateDataSize.
A hardcoded value of 128 for PatchOffset is also wrong.
In this case, PatchOffset can remain 0 for both, source and destination allocation.

D3D support (which is also prerequisite for Windows 7 AERO Glass Transparency look) needs implementation of WDDM User Mode Driver indeed. A WDDM User Mode Driver implementation needs D3D compatible hardware or a D3D software renderer.

PS: I can only suggest to thoroughly study IRQLs, too. Full understanding of IRQLs is prerequisite for comprehensive knowledge of Windows kernel mode synchronization which is a fundamental prerequisite for writing a stable WDDM driver.

Marcel Ruedinger
datronicsoft

Hi Marcel, thanks for your help and suggestions all the time.

I have a good news to tell you, just right now, the present flow went well, and I could see the contents…

I followed your suggestions, that is:

  1. Study the topic “Video Memory Management and GPU Scheduling”;
  2. Define two video memory segments, make both of them visible;
  3. Put the action of “copy screen content” in the DxgkDdiPresent;
    then, it goes well. Thank you very much.

Now, I want to continue to implement the user-mode display driver.

Thanks, Good luck.

Zhengwei lou

1 Like

Hi Marcel,

I googled this discusstion, now I have the same question: where and how I can get the bitmap data from?

I have studied the topic “Video Memory Management and GPU Scheduling” at least 20 times, but my knowledge
still not enough to let me to finish my job.

My video card has no DMA, it’s a virtual device, now it can work correctly in DOD mode, I need it work correctly
in full function mode, I modified my code according to the VirtualBox WDDM Display Card driver, now my full
function video card driver looks working correctly but no display image (I don’t know how to get the bitmap
data), the DXGK_OPERATION_FILL and DXGK_OPERATION_TRANSFER action happens in my driver, should I
get the bitmap data in DXGK_OPERATION_FILL or DXGK_OPERATION_TRANSFER?

BR,
John

Hi Marcel,

I have got the bitmap data, I don’t know if you can see my words, I must say thanks to you,
without you, I can’t solve my problem, thank you very much.

I come from China, you are my hero!

BR,
John