Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Implement DMA in AVstream minidriver for a PCIE device

tthrxtthrx Member Posts: 6

Hi, guys
I'm a newbie in windows driver developing, I find it's very hard in doing this after reading docs and debugging driver samples for one more month. I’m doing this in Win10 using VS2019, SDK10 and DDK10, the driver is x64. And debugview is the only debug tools I know how to use.
The driver I'm developing is for a PCIEx4 video/audio capture device, and i start it based on the Windows AVSHwS sample, because i need to do DMA.
Configuring the DMA engine, a soft IP provided by Xilinx XDMA, in AVStream minidriver is the biggest problem blocked me right now. To my best understanding I tried three ways, unfortunately none of them works. I really hope some one can help me.
Here is what I added into the AVSHws
1. Start from the KSPIN_DESCRIPTOR_EX, I add KSPIN_FLAG_GENERATE_MAPPINGS in to the flag part which is KSPIN_FLAG_PROCESS_IN_RUN_STATE_ONLY| KSPIN_FLAG_GENERATE_MAPPINGS now.
2. I change the PnpStart(,TranslatedResourceList) ( called by DispatchPnpStart( ,Irp ,TranslatedResourceList, )) routine to PnpStart(Irp). Because the Count of the TranslatedResourceList is always 1 and i can only get part of the resources[Q1]. While using the Irp I get TranslatedResourcesList, by parsing the list i get the BAR, Interrupt information successfully, then I create a BusMater DmaAdaptor using IoGetDmaAdapter() and register it to the AVStream class driver using KsDeviceRegisterAdapterObject() . But I can't connect the interrupt object using the IoConnectInterrupt() [Q2], every time i do this i got a blue screen, so i leave that part being commented
3. Then the DMA part comes, I tried 3 different ways to do this.
3.0. Before I start to describe the ways, I’d like to show you my understanding of the frame data flow:
3.0.1 In Pin Creating routine the frame buffers are allocated locally or passed in as a pointer from the AVStream Class driver[Q3], which uses VideoInfo configured by the DispatchSetFormate(), then the frame info is set.
3.0.2 In Pin Processing routine the Leadingedge is captured from the created pin, I get a valid return when using 1280720YUV2 (12807202 Bytes), but when I change the video size to 1280720RGB24 (12807203 Bytes)or bigger, I get a null pointer to the Leadingedge, despite that the Pin is created sucessfully (the return of the Pin Creation routine)[Q4]. If i remove the flag KSPIN_FLAG_GENERATE_MAPPINGS both will ok and even when the video size is 1920*1080RGB24. Then the leadingedge stream is cloned, and the clone is passed to ProgramScatterGatherMappings(), which is a subroutine of device.cpp and hwsim.cpp, then the new clone is locked and added into a SGList, the return shows the number of mappings added into the list, then Processing will advance the leadingedge by the number of mappings returned.
3.0.3 In the other hand, a Simulated Hardware is created in device.cpp and an interruptDPC is initiated consequently in the creating routine in hwsim.cpp, in the start routine of the hwsim.cpp a timer is initiated to trigger the interruptDPC. Then in the interruptDPC, by calling a FackHardware() routine one frame of video data is generated and in the FillScatterGatherBuffers() routine stream Clone buffer is checkout from the SGList, which is created in the ProgramScatterGatherMappings(), then the generated video data is copy into this cloned stream buffer. then a fake hardware interrupt is triggered, this interrupt routine lies in the hardware sink. In the interrupt, the total number of completed mappings is tracked by calling CompleteMappings() routine, which serves as a capture sink (the Pin) located in Pin class. Also In this routine, the newly filled clone is stamped and forward out by calling KsStreamPointerDelete().("If the frame to which StreamPointer points has no other references on it after deletion, it is completed. When the last frame in a given IRP is completed, the IRP is completed." from docs.microsoft.com) [Q5]

3.1 According to my understanding, i add the DMA operation code in the FillScatterGatherBuffers(), and this is where i got stuck[Q6]. I want to use the operation functions of DMAAdaptor create by IoGetDmaAdapter() early in PnpStart(), such as DmaAdaptor->DmaOperations->GetScatterGatherList();, but i can't find the Irq, also there is no correspoding irq->MDLAddress and a relative DriverObject, which are all needed when using the DmaOperations[Q7]. As i know doing the DMA this way is common in WDM drivers, providing that there is an IRP_WRITE or IRP_READ, but in Avstream framework, maybe things are not like this, so i tried another way.
3.2 Remember the mappings in leadingedge stream pointer! when KSPIN_FLAG_GENERATE_MAPPINGS flag is on and the video image size is set to be (12807202=450*PAGE_SIZE) , i can get a valid leadingedge and clone it, but the value of "OffsetOut.Remaining" in the leadingedge stream pointer or clone is very small (like 5), I'm expecting 450 mappings and each mapping is a PAGE_SIZE [Q8] . Despite this, How can i get the LogicAddress that the device DMA bus-master can use, since there is only PhysicalAddress in the mappings structure, and there is no operation available in DmaOperations using the mappings in a stream pointer’s OffsetOut. and the DMA bus-master's relative reg's need to be filled with the LogicAddress list (a SG list), and a start operation flag, i think this should be done in a call back routine written by me, the GetScatterGatherList() has this interface, but i can't use, again i got stuck.
3.3 With problems and questions in 3.2 I come to my 3rd trying Using common buffer DMA. this time the DmaAdaptor is not a , KSPIN_FLAG_GENERATE_MAPPINGS flag is off, and i need not to worry about the video image size, indeed the original Simulated video works fine this time (I can display it in GraphEdit with 720p or 1080p). all i need to do is change the generated image source buffer to the common buffer. In the PnpStart right after i register the DmaAdaptor to Avstream class, I use DmaVAddr=DmaAdaptor->DmaOperations->AllocateCommonBuffer (DmaAdaptor, imagesize, DmaLAddr, false);, but i got no lucky the driver crushed, and i get a blue screen.[Q9]

my questions are
[Q1]: Why the count of the TranslatedResouces passed in directly by the Dispatch routine is one, while the count from the Irp that also passed by the Dispatch routine is not.
[Q2]: Why can't I connect the Interrput object sucessfully, the params i got from the partialTranslateResource are not correct, or I missed something. (in fact, I’ve checked the Bar params from the partialTanslatedResources it is correct).
[Q3]: Really don't know who is responsible for allocating the buffer shifting into the Pin, I read some posts in OSR, which said it can be allocated locally by the minidriver or given by a render( don't if i understood that correctly).
[Q4]: As the Microsoft docs said a Null pointer will return when there are not enough resources for mapping a large image, but why it won't happy when the KSPIN_FLAG_GENERATE_MAPPINGS flag is off.
[Q5]: There the IRP comes, but it's shown only in the Microsoft docs, not really in the completeMappings() routine, the AVStream class driver must have taken care of it, I can't use it anyway, can I ?
[Q6]: Should I add the DMA operation code in FillScatterGatherBuffers(), or where do I ?
[Q7]: So there is no IRP available so I can't do the DMA like we do it in a WDM driver, no MDLAddress, no AllocateDmaChannel(), no mapTranslate(), also no GetScatterGatherList(), how can i do it?
[Q8]: the Mappings count in the leadingedge's OffsetOut is 4k Byte long for each, isn't it? what is the relationship between these mappings and those returned by calling IoGetDmaAdapter(). suppose these mappings are used in filling the DmaAdaptor. operations how can i using it, since there are only PhysicalAddress in the mappings. In fact, from the DMA device point of view, i only need to feed it with a SGlist (like LogicAddr.High; LogicAddr.low; isTheLastMapping; pNextSGEntry), and write a start flag into the SG_configReg, which is ioMapped in the PnpStart() using MmMapIoSpace(), then the DMA start to work. I really don't need a DmaAdaptor.
[Q9]: If I do the DMA using common buffer, where is the right place to allocate the common buffer and do the following process to accomplish the DMA.

Thank you for your patient. Since I'm not living in an English-speaking country, if there are words weird, I apologize.
Any words are appreciated, Thank you in advance.

Comments

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,496

    Q1: The untranslated resources include a number of internal things that don't matter to you. Your TranslatedResources should include your memory BARs and your interrupt, if your hardware requested one. Are yous seeing the memory BAR but not the interrupt? Is it supposed to be an MSI interrupt? Have you added the INF registry entries to advertise MSI support? Are you sure the Xilinx IP is properly configured for interrupts?

    Q2: What error do you get from IoConnectInterrupt?

    Q3: Are you trying to DMA the video data directly into the app's buffers? That's the model that AVSHWS tries to use. The memory is allocated by the DirectShow filter graph in user mode, and passed down to you as the "leading edge". When you generate mappings and send those to your DMA engine, you are DMAing directly into those user-mode buffers.

    Q4: Don't understand.

    Q5: Why do you think you need the IRP? AVStream transfers are done using a couple of ill-defined KS ioctls that you shouldn't need to understand. The AVStream framework massages those buffers into the leading edge / trailing edge concept. When you advance and unlock the leading edge frame, behind the scene, the AVStream framework will complete the IRP that sent the buffer or buffers.

    Q6: The model here is that capture is continuous. You'll program and trigger the DMA cycle during Start. You'll add new empty pages to your hardware in ProgramScatterGatherMappings. The FillScatterGatherBuffers is a fake routine that simulates what the hardware is doing through DMA. You do have to worry about what happens if you get behind and the hardware runs out of buffers. This is why I sometimes fallback to using common buffer DMA, and then copying the data into the leading edge, without using the mappings.

    Q7: No, AVStream is doing all of that for you. In the leading edge structure, OffsetOut.Mappings points to the scatter/gather mappings that it has already generated. You don't need your own DmaAdapter.

    Q8: There is one physical address per 4K page, although it's possible AVStream might be combining adjacent pages. The Xilinx DMA I've used requires a 64-byte scatter/gather descriptor for each contiguous chunk of memory, which of course doesn't match the list you're getting. You will need to create a descriptor table in memory, point the hardware to that table, and then manipulate the table entries as you receive more buffers.

    Q9: Usually, you need to allocate your common buffer very early on. Physical memory gets fragmented quite rapidly. If you wait until the KS_STATE changes to "acquire", which would be the sensible place to do it, there's a chance the system won't find enough contiguous physical pages. That's only a problem if your buffers are quite large.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • tthrxtthrx Member Posts: 6

    Hi Tim thank you, I'm very excited you are answering my questions, I’ve read a lot of your replies in other posts about AVStream topics, those are really helpful. you must be an expert. My gratitude.

    @Tim_Roberts said:
    Q1: The untranslated resources include a number of internal things that don't matter to you. Your TranslatedResources should include your memory BARs and your interrupt, if your hardware requested one. Are you seeing the memory BAR but not the interrupt? Is it supposed to be an MSI interrupt? Have you added the INF registry entries to advertise MSI support? Are you sure the Xilinx IP is properly configured for interrupts?

    The XDMA IP is configured like this: 2 BARs are used, BAR0 is for a on chip BRAM 1MB, BAR1 is for DMA. The MSI and legacy interrupts are both enabled.
    I’m not sure if the interrupts are correctly configured, but with the same configuration this device works fine using the XDMA_driver(a WDF driver) provided by Xilinx, so I think they are. Actually, I’m seeing only BAR0, when I use the argument "thranslatedResources" passed in directly, while I can’t get all resources BAR0, BAR1 and an interrupt from IRP, which is another passed in argument, and there are two more (CmResourceTypeDevicePrivate) lies in the IRP.

    No registry entries are added into the INF file, I just don’t know how to add them.

    Q2: What error do you get from IoConnectInterrupt?

    The only information I can get from the blue screen is “HAL_INITIALIZTIN_FAILED”, maybe it’s the time to find another PC now, then I can get more by using WinDbg.

    Q3: Are you trying to DMA the video data directly into the app's buffers? That's the model that AVSHWS tries to use. The memory is allocated by the DirectShow filter graph in user mode, and passed down to you as the "leading edge". When you generate mappings and send those to your DMA engine, you are DMAing directly into those user-mode buffers.

    Yes, I mean to do this at first, but I can only get PhysicalAddress(PAddr) from the mappings provided by the “leading edge” or the clone of it. Do you mean it is the PAddr that I should send to the XDMA engine, instead of a corresponding LogicAddress(LAddr), But as I know, a device needs the LAddr to address the Memory. If so, then I can use the PAddrs directly to generate a dedicated SG-list that XDMA engine requires, while if not, which kernel mode API I should use to translate the PAddr(or some corresponding VirtualAddress(VAddr)) to the LAddr.

    Q4: Don't understand.

    Sorry for the miss spelling “happy”, it should be “happen”, and the product sign disappears in the video size param (1280x720x2YUV2, 1280x720x3RGB24).
    The question is, when the flag KSPIN_FLAG_GENERATE_MAPPINGS is on and the video size is set to be 1280x720x3RGB24, I will get a NULL return from the getLeadingEdge routine, when the video size is set as 1280x720x2YUV2, the return is a valid stream pointer, but mappings in OffsetOut.remaining is a small number like 5. According to Microsoft docs, this should be not enough resources available.
    If I switch the flag off, both video size works, even when the size is 1920x1080x3RGB24 I can get a valid pointer to the leadingedge.
    So my understanding of this is the mapping resources are different from the buffer reources.

    Q5: Why do you think you need the IRP? AVStream transfers are done using a couple of ill-defined KS ioctls that you shouldn't need to understand. The AVStream framework massages those buffers into the leading edge / trailing edge concept. When you advance and unlock the leading edge frame, behind the scene, the AVStream framework will complete the IRP that sent the buffer or buffers.

    I thought I could do the DMA as it is in a WDM driver (I have a reference WDM driver AVNET provided this), now I know I don’t need the IRP.

    Q6: The model here is that capture is continuous. You'll program and trigger the DMA cycle during Start. You'll add new empty pages to your hardware in ProgramScatterGatherMappings. The FillScatterGatherBuffers is a fake routine that simulates what the hardware is doing through DMA. You do have to worry about what happens if you get behind and the hardware runs out of buffers. This is why I sometimes fallback to using common buffer DMA, and then copying the data into the leading edge, without using the mappings.

    Do you mean I should manipulating the DMA in ProgramScatterGatherMappings, using mappings in the clone of a leading edge, and get rid of routines in hwsim.cpp.

    Q7: No, AVStream is doing all of that for you. In the leading edge structure, OffsetOut.Mappings points to the scatter/gather mappings that it has already generated. You don't need your own DmaAdapter.

    The idea that using a DmaAdapter comes from Microsoft dosc in guiding how to do the DMA in AVStream framework. And the DmaAdapter, which is set by calling ioGetDmaAdapter(), is registered to the AVStream class driver by calling KsDeviceRegisterAdapterObject().
    Do you mean that as long as I have a clone of the leading edge structure, I can get the PAddr from OffseOut.Mappings. Using the PAddr to construct a s/g list that XDMA engine requires, I can make the DMA run. So the DmaAdapter is useless here.

    Q8: There is one physical address per 4K page, although it's possible AVStream might be combining adjacent pages. The Xilinx DMA I've used requires a 64-byte scatter/gather descriptor for each contiguous chunk of memory, which of course doesn't match the list you're getting. You will need to create a descriptor table in memory, point the hardware to that table, and then manipulate the table entries as you receive more buffers.

    Yes, thank you for pointing out, I think I should do it like this, and the XDMA_driver, written in WDF is a good reference.

    Q9: Usually, you need to allocate your common buffer very early on. Physical memory gets fragmented quite rapidly. If you wait until the KS_STATE changes to "acquire", which would be the sensible place to do it, there's a chance the system won't find enough contiguous physical pages. That's only a problem if your buffers are quite large.

    I try to allocate the common buffer in the PnpStart(), right after parsing the translated resources and get DmaAdapter set. I just try to allocate only one PAGE_SIZE of memory, it still failed with a blue screen "KMODE_EXCEPTION_NOT_HANDLED". I’m doing this by calling DmaAdapter->DmaOperations-> AllocateCommonBuffer(DmaAdapter, PAGE_SIZE, LAddr, FALSE);

    As you suggest DmaAdapter is not necessary, I will try to allocate memory using
    MmAllocatePagesForMdl() or MmAllocateNodePagesForMdl for non-contiguous physical memory, or MmAllocateContiguousMemory for contiguous physical memory, and somehow get the LAddr to generate the s/g list, and let the DMA fill these memory, then copy the data to clone buffers at the driver side.

  • tthrxtthrx Member Posts: 6

    Sorry, under the first qoute, "... while I can’t get all resources BAR0, BAR1 and an interrupt from IRP.." , it should be " I can get all resoureces BAR0,BAR1...."

  • tthrxtthrx Member Posts: 6

    I find this in the XDMA_driver, I think i need to add it into my INF file
    ; 24 Interrupt resources are required - 16 for user events and 8 for dma channels
    [xdma.EnableMSI]
    HKR,"Interrupt Management",,0x00000010
    HKR,"Interrupt Management\MessageSignaledInterruptProperties",,0x00000010
    HKR,"Interrupt Management\MessageSignaledInterruptProperties",MSISupported,0x00010001,1
    HKR,"Interrupt Management\MessageSignaledInterruptProperties",MessageNumberLimit,0x00010001,32

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,496

    ...but I can only get PhysicalAddress(PAddr) from the mappings provided by the “leading edge” or the clone of it.
    Do you mean it is the PAddr that I should send to the XDMA engine, instead of a corresponding LogicAddress(LAddr),

    The physical/logical are mixed loosely, and can cause much confusion. The numbers you get in PAddr are the numbers you need to give to the hardware.

    The question is, when the flag KSPIN_FLAG_GENERATE_MAPPINGS is on and the video size is set to be 1280x720x3RGB24,
    I will get a NULL return from the getLeadingEdge routine,

    That kind of thing often means that you have filled out your data ranges incorrectly. Those are large and somewhat confusing structures. Take a closer look.

    Do you mean I should manipulating the DMA in ProgramScatterGatherMappings, using mappings in the clone of a leading edge,
    and get rid of routines in hwsim.cpp.

    Maybe. Remember that avshws is just a sample. It's one possible implementation. The hwsim module is trying to simulate DMA-based capture hardware. For real hardware, much of that code becomes unnecessary.

    Do you mean that as long as I have a clone of the leading edge structure, I can get the PAddr from OffseOut.Mappings. Using the
    PAddr to construct a s/g list that XDMA engine requires, I can make the DMA run. So the DmaAdapter is useless here.

    Right. AVStream already has a DmaAdapter and is doing the calls for you.

    I find this in the XDMA_driver, I think i need to add it into my INF file

    Yes, exactly, although you probably don't need 24 events. Their IP supports up to 8 DMA engines, but few designs really use that many.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • tthrxtthrx Member Posts: 6
    Tank you Tim, I will try as you suggested.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 15 Jun 2020 LIVE ONLINE
Writing WDF Drivers 22 June 2020 LIVE ONLINE
Internals & Software Drivers 28 Sept 2020 Dulles, VA