0xc0000005 (Access Violation) in MmProbeAndLockPages

Hi, Nice to meet you everyone.

I’m having a problem

I Created user memory Buffer 256MB using VirtualAlloc function.
and I stored this virtual address in device extension structure.

And I let my PCI Express IF Card to transfer my DMA Buffer.
after that I generated interrupt to notify device driver that transfer has done.

in ISR routine I put to dpc queue to let Dpc generate event which will wakes system thread up.

after system thread routine got this event, this routine allocates MDL with IoAllocateMdl

and after that I tried to Lock Memory with MmProbeAndLockPages
But after that I got exception with 0xc0000005.

I want to know what is causing this problem.
And I want to know is there any methods that I can transfer streaming data from PCI Express card to User Buffer?

I think I have to allocate Mdl each interrupts in this system thread because I have to transfer over than 2GBytes of memory. Because I can’t allocate Mdl & Physical DMA Buffer like that much.

It sounds like you are trying to lock down user memory which belongs to the
address space of your application in the context of the System process which
does not work. As a solution you might lock the memory in the context of
your application as a response to an IOCTL.

//Daniel

wrote in message news:xxxxx@ntdev…
> Hi, Nice to meet you everyone.
>
> I’m having a problem
>
> I Created user memory Buffer 256MB using VirtualAlloc function.
> and I stored this virtual address in device extension structure.
>
> And I let my PCI Express IF Card to transfer my DMA Buffer.
> after that I generated interrupt to notify device driver that transfer has
> done.
>
> in ISR routine I put to dpc queue to let Dpc generate event which will
> wakes system thread up.
>
> after system thread routine got this event, this routine allocates MDL
> with IoAllocateMdl
>
> and after that I tried to Lock Memory with MmProbeAndLockPages
> But after that I got exception with 0xc0000005.
>
> I want to know what is causing this problem.
> And I want to know is there any methods that I can transfer streaming data
> from PCI Express card to User Buffer?
>
> I think I have to allocate Mdl each interrupts in this system thread
> because I have to transfer over than 2GBytes of memory. Because I can’t
> allocate Mdl & Physical DMA Buffer like that much.
>
>
>

Thank you Daniel.
Now I understand that I can’t use user context buffer in system process context.
Thank you very much.

I tried lock this user buffer down like this, because Requiring buffer is huge & I have to transfer with restricted DMA buffer.

Is there any method that I can transfer Streaming data directly to User buffer of minimum 512 MB?

See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Tuesday, July 12, 2011 6:34 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages

Hi, Nice to meet you everyone.

I’m having a problem

I Created user memory Buffer 256MB using VirtualAlloc function.
and I stored this virtual address in device extension structure.

****
That is wrong. If you use Direct I/O, and the MDLAddress field will hold
the MDL that defines this user buffer. Or, you use “mode neither”, and use
the user-mode address to create a MDL which you must allocate in your
top-level dispatch routine. After that, you build partial MDLs for
transferring segments.

The user buffer address is totally useless to you in your DPC or a system
thread. It is valid *only* in the top-level dispatch routine.
****

And I let my PCI Express IF Card to transfer my DMA Buffer.
after that I generated interrupt to notify device driver that transfer has
done.

****
This cannot work if the address you give it is the user address. You have
not said if you are trying to use the user-mode address for your DMA or you
are using the internal address of a buffer you are allocating, and you do
not say how you are allocating it, so we don’t know if it is a physically
contiguous buffer or just a “buffer”, which can be scattered over many
discontiguous pages, which means you have to deal with scatter/gather DMA.
*****

in ISR routine I put to dpc queue to let Dpc generate event which will wakes
system thread up.

after system thread routine got this event, this routine allocates MDL with
IoAllocateMdl

****
What good is this going to do? You are “years” (in computer time) too late
to do anything with a user-mode address at this point. It has no meaning
whatsoever in a system thread.
****

and after that I tried to Lock Memory with MmProbeAndLockPages
But after that I got exception with 0xc0000005.

****
First, if you do MmProbeAndLockPages you need to do it in the context of an
exception frame to capture any exceptions so you don’t get a BSOD. Note
that if you get an exception, it means your locking failed, and you cannot
complete the I/O. Note also that MmProbeAndLockPages is probably going to
fail because you are not using the MDL in the IRP (if direct I/O) or the MDL
you allocated in your top-level dispatch routine (if using “mode neither”,
but one you have “allocated” (and you do not tell use how you got the
parameters for allocating it!) and tried to initialize by using some
nonsense random number (aka “the user-mode address”) in a context in which
this address is pretty much guaranteed to be complete nonsense.
****

I want to know what is causing this problem.

****
You are not using Direct I/O would be a good first guess. Allocating your
own MDL in a context in which the user address is a meaningless random
number is a likely contributor. Storing a user-mode address is the most
fatal aspect I can identify. Doing an MmProbeAndLockPages without an
exception frame is the direct cause.

You must allocate the MDL in your top-level entry routine (e.g., the handler
for IRP_MJ_READ or IRP_MJ_DEVICE_CONTROL, depending on how you are reading
the data) so that it has the correct information in it. Then you will use
“mode neither” so the I/O manager doesn’t try to lock all the pages down.
You can then do DMA into pieces of the MDL using a “partial MDL” which you
can create from the original MDL
****

And I want to know is there any methods that I can transfer streaming data
from PCI Express card to User Buffer?

****
Handle your MDL in the top-level dispatch routine
****

I think I have to allocate Mdl each interrupts in this system thread because
I have to transfer over than 2GBytes of memory. Because I can’t allocate Mdl
& Physical DMA Buffer like that much.

****
It is far more complex than you think. You just said that the user
allocates 256MB of memory, so how did you suddenly get this 2GB number? You
don’t need to transfer, by your specification, more than 256MB of memory.
You seem to feel a need to transfer it in pieces, which suggests the card
has a seriously bad design (no scatter/gather capability), but there are
ways of dealing with that.

How did you jump from 256MB to 2GB?
****


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thank you for your reply Mr. Joseph
I could get some hints that I can execute it because of your reply.
(Trying to explain user buffer with Scatter Gather DMA using PartialMdl?)

First of all, Let me explain about my situation.

I designed PCI Express I/F Card Interface card 3 years ago with FPGA .
And this board doesn’t have any memory (so that I’m facing this problem.)

I also wrote WDM driver and I transfered data with common buffer DMA mode.
(because target system had huge memory)
In that system I initialized DMA Buffers in StartDevice of PNP routine
(using AllocateCommonBuffer)
(In my poor memory, that was 2 buffers of 1MBytes(Transfer Data Buffer) and
1 buffer for 4kBytes(command I/F))
I also can achieve Physical Memory Address & Virtual Address so that I can
copy between FPGA & driver.

At that time I copied in my dispatch routine from Common buffer to user
memory
(IoAllocateMdl-> MmProbeAndLockPages-> MmGetSystemAddressForMdlSafe->
memcpy)
I thought That I could use this method can be used in System Thread also
because I used it in Dispatch Routine before.

Now, Let’s talk about my current system.
I initialized 4 transfer data buffers of 2MBytes and 1 buffer for 4kBytes
with CommonBufferDMA.
Because of New system doesn’t have any memory, So Target System just push
all Line Scan Camera Images to my Interface Card like streaming data.
I have to push all datas also because My board doesn’t have any memory.

That’s why I deciede to use PCI Express interrupt to notify driver that
buffer has been filled,
to transfer each CommonBuffers to user buffers
(I think this is fastest way to implement for me not to modify my current
FPGA design that much)

And Because of this design is based on Line scan camera Grabber, Transfer
length will be decided by the size of LCD Glass.
(that’s why transfer size can be expanded more than 2GB)
It’s midnight in here.
So, I’ll try tomorrow morning to get scatter gather list.
I think after I get system addresses of this user buffer I think I can copy
camera datas(in CommonBuffer) to System address of user buffer in System
Thread.

Is there anything wrong?
Please give me any comment
Any comments would be helpful.
thank you for your reading.

2011/7/12 Joseph M. Newcomer

> See below…
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of
> xxxxx@gmail.com
> Sent: Tuesday, July 12, 2011 6:34 AM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages
>
> Hi, Nice to meet you everyone.
>
> I’m having a problem
>
> I Created user memory Buffer 256MB using VirtualAlloc function.
> and I stored this virtual address in device extension structure.
>
>
>
> That is wrong. If you use Direct I/O, and the MDLAddress field will hold
> the MDL that defines this user buffer. Or, you use “mode neither”, and use
> the user-mode address to create a MDL which you must allocate in your
> top-level dispatch routine. After that, you build partial MDLs for
> transferring segments.
>
>
> The user buffer address is totally useless to you in your DPC or a system
> thread. It is valid only in the top-level dispatch routine.
>

>
> And I let my PCI Express IF Card to transfer my DMA Buffer.
> after that I generated interrupt to notify device driver that transfer has
> done.
>
>
> This cannot work if the address you give it is the user address. You have
> not said if you are trying to use the user-mode address for your DMA or you
> are using the internal address of a buffer you are allocating, and you do
> not say how you are allocating it, so we don’t know if it is a physically
> contiguous buffer or just a “buffer”, which can be scattered over many
> discontiguous pages, which means you have to deal with scatter/gather DMA.
>
*
>
> in ISR routine I put to dpc queue to let Dpc generate event which will
> wakes
> system thread up.
>
> after system thread routine got this event, this routine allocates MDL with
> IoAllocateMdl
>
>
> What good is this going to do? You are “years” (in computer time) too late
> to do anything with a user-mode address at this point. It has no meaning
> whatsoever in a system thread.
>

>
> and after that I tried to Lock Memory with MmProbeAndLockPages
> But after that I got exception with 0xc0000005.
>
>
> First, if you do MmProbeAndLockPages you need to do it in the context of an
> exception frame to capture any exceptions so you don’t get a BSOD. Note
> that if you get an exception, it means your locking failed, and you cannot
> complete the I/O. Note also that MmProbeAndLockPages is probably going to
> fail because you are not using the MDL in the IRP (if direct I/O) or the
> MDL
> you allocated in your top-level dispatch routine (if using “mode neither”,
> but one you have “allocated” (and you do not tell use how you got the
> parameters for allocating it!) and tried to initialize by using some
> nonsense random number (aka “the user-mode address”) in a context in which
> this address is pretty much guaranteed to be complete nonsense.
>

>
> I want to know what is causing this problem.
>
>
>
> You are not using Direct I/O would be a good first guess. Allocating your
> own MDL in a context in which the user address is a meaningless random
> number is a likely contributor. Storing a user-mode address is the most
> fatal aspect I can identify. Doing an MmProbeAndLockPages without an
> exception frame is the direct cause.
>
> You must allocate the MDL in your top-level entry routine (e.g., the
> handler
> for IRP_MJ_READ or IRP_MJ_DEVICE_CONTROL, depending on how you are reading
> the data) so that it has the correct information in it. Then you will use
> “mode neither” so the I/O manager doesn’t try to lock all the pages down.
> You can then do DMA into pieces of the MDL using a “partial MDL” which you
> can create from the original MDL
>

>
> And I want to know is there any methods that I can transfer streaming data
> from PCI Express card to User Buffer?
>
>
>
> Handle your MDL in the top-level dispatch routine
>

>
> I think I have to allocate Mdl each interrupts in this system thread
> because
> I have to transfer over than 2GBytes of memory. Because I can’t allocate
> Mdl
> & Physical DMA Buffer like that much.
>
>
> It is far more complex than you think. You just said that the user
> allocates 256MB of memory, so how did you suddenly get this 2GB number?
> You
> don’t need to transfer, by your specification, more than 256MB of memory.
> You seem to feel a need to transfer it in pieces, which suggests the card
> has a seriously bad design (no scatter/gather capability), but there are
> ways of dealing with that.
>
> How did you jump from 256MB to 2GB?
>

>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Just to let you know I’m not ignoring you: it’s nearly noon here, and I have
to leave for an appointment that will probably keep me out all day. I may
have time to reply to this tonight, or more likely tomorrow, unless other
crises in my life (of which I have one) get in the way.
joe


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of ???
Sent: Tuesday, July 12, 2011 11:16 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages

Thank you for your reply Mr. Joseph
I could get some hints that I can execute it because of your reply.
(Trying to explain user buffer with Scatter Gather DMA using PartialMdl?)

First of all, Let me explain about my situation.

I designed PCI Express I/F Card Interface card 3 years ago with FPGA .
And this board doesn’t have any memory (so that I’m facing this problem.)

I also wrote WDM driver and I transfered data with common buffer DMA mode.
(because target system had huge memory)
In that system I initialized DMA Buffers in StartDevice of PNP routine
(using AllocateCommonBuffer)
(In my poor memory, that was 2 buffers of 1MBytes(Transfer Data Buffer) and
1 buffer for 4kBytes(command I/F))
I also can achieve Physical Memory Address & Virtual Address so that I can
copy between FPGA & driver.

At that time I copied in my dispatch routine from Common buffer to user
memory
(IoAllocateMdl-> MmProbeAndLockPages-> MmGetSystemAddressForMdlSafe->
memcpy)
I thought That I could use this method can be used in System Thread also
because I used it in Dispatch Routine before.

Now, Let’s talk about my current system.
I initialized 4 transfer data buffers of 2MBytes and 1 buffer for 4kBytes
with CommonBufferDMA.
Because of New system doesn’t have any memory, So Target System just push
all Line Scan Camera Images to my Interface Card like streaming data.
I have to push all datas also because My board doesn’t have any memory.

That’s why I deciede to use PCI Express interrupt to notify driver that
buffer has been filled,
to transfer each CommonBuffers to user buffers
(I think this is fastest way to implement for me not to modify my current
FPGA design that much)

And Because of this design is based on Line scan camera Grabber, Transfer
length will be decided by the size of LCD Glass.
(that’s why transfer size can be expanded more than 2GB)
It’s midnight in here.
So, I’ll try tomorrow morning to get scatter gather list.
I think after I get system addresses of this user buffer I think I can copy
camera datas(in CommonBuffer) to System address of user buffer in System
Thread.

Is there anything wrong?
Please give me any comment
Any comments would be helpful.
thank you for your reading.

2011/7/12 Joseph M. Newcomer
See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Tuesday, July 12, 2011 6:34 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages

Hi, Nice to meet you everyone.

I’m having a problem

I Created user memory Buffer 256MB using VirtualAlloc function.
and I stored this virtual address in device extension structure.


That is wrong. If you use Direct I/O, and the MDLAddress field will hold
the MDL that defines this user buffer. Or, you use “mode neither”, and use
the user-mode address to create a MDL which you must allocate in your
top-level dispatch routine. After that, you build partial MDLs for
transferring segments.

The user buffer address is totally useless to you in your DPC or a system
thread. It is valid only in the top-level dispatch routine.


And I let my PCI Express IF Card to transfer my DMA Buffer.
after that I generated interrupt to notify device driver that transfer has
done.

This cannot work if the address you give it is the user address. You have
not said if you are trying to use the user-mode address for your DMA or you
are using the internal address of a buffer you are allocating, and you do
not say how you are allocating it, so we don’t know if it is a physically
contiguous buffer or just a “buffer”, which can be scattered over many
discontiguous pages, which means you have to deal with scatter/gather DMA.
*

in ISR routine I put to dpc queue to let Dpc generate event which will wakes
system thread up.

after system thread routine got this event, this routine allocates MDL with
IoAllocateMdl

What good is this going to do? You are “years” (in computer time) too late
to do anything with a user-mode address at this point. It has no meaning
whatsoever in a system thread.


and after that I tried to Lock Memory with MmProbeAndLockPages
But after that I got exception with 0xc0000005.

First, if you do MmProbeAndLockPages you need to do it in the context of an
exception frame to capture any exceptions so you don’t get a BSOD. Note
that if you get an exception, it means your locking failed, and you cannot
complete the I/O. Note also that MmProbeAndLockPages is probably going to
fail because you are not using the MDL in the IRP (if direct I/O) or the MDL
you allocated in your top-level dispatch routine (if using “mode neither”,
but one you have “allocated” (and you do not tell use how you got the
parameters for allocating it!) and tried to initialize by using some
nonsense random number (aka “the user-mode address”) in a context in which
this address is pretty much guaranteed to be complete nonsense.


I want to know what is causing this problem.


You are not using Direct I/O would be a good first guess. Allocating your
own MDL in a context in which the user address is a meaningless random
number is a likely contributor. Storing a user-mode address is the most
fatal aspect I can identify. Doing an MmProbeAndLockPages without an
exception frame is the direct cause.

You must allocate the MDL in your top-level entry routine (e.g., the handler
for IRP_MJ_READ or IRP_MJ_DEVICE_CONTROL, depending on how you are reading
the data) so that it has the correct information in it. Then you will use
“mode neither” so the I/O manager doesn’t try to lock all the pages down.
You can then do DMA into pieces of the MDL using a “partial MDL” which you
can create from the original MDL


And I want to know is there any methods that I can transfer streaming data
from PCI Express card to User Buffer?


Handle your MDL in the top-level dispatch routine


I think I have to allocate Mdl each interrupts in this system thread because
I have to transfer over than 2GBytes of memory. Because I can’t allocate Mdl
& Physical DMA Buffer like that much.

It is far more complex than you think. You just said that the user
allocates 256MB of memory, so how did you suddenly get this 2GB number? You
don’t need to transfer, by your specification, more than 256MB of memory.
You seem to feel a need to transfer it in pieces, which suggests the card
has a seriously bad design (no scatter/gather capability), but there are
ways of dealing with that.

How did you jump from 256MB to 2GB?



NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the
List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Since you are using such tiny buffers (2MB), the ideal solution would have
been to use direct mode I/O and have your device transfer everything
directly into user space. But that does require having scatter/gather
capability (hint: when designing real devices to run in real operating
systems, having “unlimited scatter gather” as part of your basic design is a
Really Good Idea). But given you don’t have that, the common buffer
approach with the horrendous copy is probably the best you are going to be
able to do. So what you seem to have is a set of N buffers, and you cycle
through them, if I’ve understood this correctly. Then, it sounds like a
single image can be 2GB, which means you are going to have to transfer 2GB
in 2MB chunks, that is, 1024 transfers per frame time. So one of my
concerns is a standard back-of-the-envelope computation to show if the data
rate can be sustained, no matter how “efficient” PCIx might be. If we
assume the buffers are DWORD-aligned, transferring a 2MB buffer requires
512K DWORD fetches and 512K DWORD stores (memcpy does a MOVSD for the main
body), and that’s starting to push the performance envelope. But you have
to do 1024 of these per frame time, which I think exceeds what most systems
could hope to accomplish.

With 4 2MB buffers, you can read 8MB before you have to reuse one of the
common buffers, so you need to make sure that the copy can complete before
you have to recycle the first buffer of the set. If you are doing this copy
in a passive thread, you are at the mercy of the thread scheduler. This
means there is *substantial* latency between the time you determine the
thread should run and when it actually *does* run; if this latency is more
than three buffer-fills, you’re dead. You can’t assume zero latency to
activate a thread (I’ve seen people in user-level apps make this assumption,
and wonder why their projects fail to work as defined). You can’t assume
that thread runs 100% of the time, because interrupts and DPCs will preempt
it, so you can’t assume that even after it starts, it completes its action
in a predictable amount of time.

This is why most devices with high bandwidth requirements have onboard
memory and scatter/gather DMA, and even then, you’re pushing the envelope
with a 2GB frame. But with intermediate staging and explicit memcpy, you
double or triple the number of memory accesses required over straight
smart-DMA. Or maybe worse. Before you start trying to make this work, you
should work out the math to see if can ever be made to work. I once
participated in evaluating a project whose required bandwidth was something
like six times the available I/O bandwidth (what bothered ME was the group
had defined complex communication packets down to the bit level without once
saying what problem they were trying to solve; a colleague from EE worked
out the required bandwidth and said “Hey guys, .” and the project
disappeared without a trace three days later).

I’d like to see some performance numbers, or at least what you can say
without violating proprietary information, such as bytes/frame, frames/sec,
etc.
joe


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of ???
Sent: Tuesday, July 12, 2011 11:16 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages

Thank you for your reply Mr. Joseph
I could get some hints that I can execute it because of your reply.
(Trying to explain user buffer with Scatter Gather DMA using PartialMdl?)

First of all, Let me explain about my situation.

I designed PCI Express I/F Card Interface card 3 years ago with FPGA .
And this board doesn’t have any memory (so that I’m facing this problem.)

I also wrote WDM driver and I transfered data with common buffer DMA mode.
(because target system had huge memory)
In that system I initialized DMA Buffers in StartDevice of PNP routine
(using AllocateCommonBuffer)
(In my poor memory, that was 2 buffers of 1MBytes(Transfer Data Buffer) and
1 buffer for 4kBytes(command I/F))
I also can achieve Physical Memory Address & Virtual Address so that I can
copy between FPGA & driver.

At that time I copied in my dispatch routine from Common buffer to user
memory
(IoAllocateMdl-> MmProbeAndLockPages-> MmGetSystemAddressForMdlSafe->
memcpy)
I thought That I could use this method can be used in System Thread also
because I used it in Dispatch Routine before.

Now, Let’s talk about my current system.
I initialized 4 transfer data buffers of 2MBytes and 1 buffer for 4kBytes
with CommonBufferDMA.
Because of New system doesn’t have any memory, So Target System just push
all Line Scan Camera Images to my Interface Card like streaming data.
I have to push all datas also because My board doesn’t have any memory.

That’s why I deciede to use PCI Express interrupt to notify driver that
buffer has been filled,
to transfer each CommonBuffers to user buffers
(I think this is fastest way to implement for me not to modify my current
FPGA design that much)

And Because of this design is based on Line scan camera Grabber, Transfer
length will be decided by the size of LCD Glass.
(that’s why transfer size can be expanded more than 2GB)
It’s midnight in here.
So, I’ll try tomorrow morning to get scatter gather list.
I think after I get system addresses of this user buffer I think I can copy
camera datas(in CommonBuffer) to System address of user buffer in System
Thread.

Is there anything wrong?
Please give me any comment
Any comments would be helpful.
thank you for your reading.

2011/7/12 Joseph M. Newcomer
See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Tuesday, July 12, 2011 6:34 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages

Hi, Nice to meet you everyone.

I’m having a problem

I Created user memory Buffer 256MB using VirtualAlloc function.
and I stored this virtual address in device extension structure.


That is wrong. If you use Direct I/O, and the MDLAddress field will hold
the MDL that defines this user buffer. Or, you use “mode neither”, and use
the user-mode address to create a MDL which you must allocate in your
top-level dispatch routine. After that, you build partial MDLs for
transferring segments.

The user buffer address is totally useless to you in your DPC or a system
thread. It is valid only in the top-level dispatch routine.


And I let my PCI Express IF Card to transfer my DMA Buffer.
after that I generated interrupt to notify device driver that transfer has
done.

This cannot work if the address you give it is the user address. You have
not said if you are trying to use the user-mode address for your DMA or you
are using the internal address of a buffer you are allocating, and you do
not say how you are allocating it, so we don’t know if it is a physically
contiguous buffer or just a “buffer”, which can be scattered over many
discontiguous pages, which means you have to deal with scatter/gather DMA.
*

in ISR routine I put to dpc queue to let Dpc generate event which will wakes
system thread up.

after system thread routine got this event, this routine allocates MDL with
IoAllocateMdl

What good is this going to do? You are “years” (in computer time) too late
to do anything with a user-mode address at this point. It has no meaning
whatsoever in a system thread.


and after that I tried to Lock Memory with MmProbeAndLockPages
But after that I got exception with 0xc0000005.

First, if you do MmProbeAndLockPages you need to do it in the context of an
exception frame to capture any exceptions so you don’t get a BSOD. Note
that if you get an exception, it means your locking failed, and you cannot
complete the I/O. Note also that MmProbeAndLockPages is probably going to
fail because you are not using the MDL in the IRP (if direct I/O) or the MDL
you allocated in your top-level dispatch routine (if using “mode neither”,
but one you have “allocated” (and you do not tell use how you got the
parameters for allocating it!) and tried to initialize by using some
nonsense random number (aka “the user-mode address”) in a context in which
this address is pretty much guaranteed to be complete nonsense.


I want to know what is causing this problem.


You are not using Direct I/O would be a good first guess. Allocating your
own MDL in a context in which the user address is a meaningless random
number is a likely contributor. Storing a user-mode address is the most
fatal aspect I can identify. Doing an MmProbeAndLockPages without an
exception frame is the direct cause.

You must allocate the MDL in your top-level entry routine (e.g., the handler
for IRP_MJ_READ or IRP_MJ_DEVICE_CONTROL, depending on how you are reading
the data) so that it has the correct information in it. Then you will use
“mode neither” so the I/O manager doesn’t try to lock all the pages down.
You can then do DMA into pieces of the MDL using a “partial MDL” which you
can create from the original MDL


And I want to know is there any methods that I can transfer streaming data
from PCI Express card to User Buffer?


Handle your MDL in the top-level dispatch routine


I think I have to allocate Mdl each interrupts in this system thread because
I have to transfer over than 2GBytes of memory. Because I can’t allocate Mdl
& Physical DMA Buffer like that much.

It is far more complex than you think. You just said that the user
allocates 256MB of memory, so how did you suddenly get this 2GB number? You
don’t need to transfer, by your specification, more than 256MB of memory.
You seem to feel a need to transfer it in pieces, which suggests the card
has a seriously bad design (no scatter/gather capability), but there are
ways of dealing with that.

How did you jump from 256MB to 2GB?



NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the
List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thank you for your reply Mr. Joseph.

Yesterday I couldn’t reply to your mail because I have to take care of my
baby instead of my wife.
I’m really sorry about that.
Thank you for your long reply.

I could successfully transfer data from My interface Card to PC using 2MB of
4 buffers & system thread routines

My Interface Card has 4 lane of PCI Express Endpoint Gen 1 which can
transfer about 800MB/s maximum (Bandwidth : 1GB/s)
but My target device doesnt need such fast speed because it is using 1x lane
of Serial Rapid IO
Target Device(Frame Grabber) requires just 100 MB/s of transfer speed so
that I can transfer easily.

I created mdl which can explain 256MB mdl & this context’s user buffer of
Non paged memory using CreateAndMapMemory function which can be found in NT
Insider journal.
And I made System Buffer for this MDL using MmGetSystemAddressForMdlSafe
function.

After DpcForIsr routine generate event every interrupts, I copied data to
system buffer so that user GUI can read data from . ^^

Thank you very much again.
Wish you have a good days. Bye.

2011/7/16 Joseph M. Newcomer

> Since you are using such tiny buffers (2MB), the ideal solution would
> have been to use direct mode I/O and have your device transfer everything
> directly into user space. But that does require having scatter/gather
> capability (hint: when designing real devices to run in real operating
> systems, having ?unlimited scatter gather? as part of your basic design is a
> Really Good Idea). But given you don?t have that, the common buffer
> approach with the horrendous copy is probably the best you are going to be
> able to do. So what you seem to have is a set of N buffers, and you cycle
> through them, if I?ve understood this correctly. Then, it sounds like a
> single image can be 2GB, which means you are going to have to transfer 2GB
> in 2MB chunks, that is, 1024 transfers per frame time. So one of my
> concerns is a standard back-of-the-envelope computation to show if the data
> rate can be sustained, no matter how ?efficient? PCIx might be. If we
> assume the buffers are DWORD-aligned, transferring a 2MB buffer requires
> 512K DWORD fetches and 512K DWORD stores (memcpy does a MOVSD for the main
> body), and that?s starting to push the performance envelope. But you have
> to do 1024 of these per frame time, which I think exceeds what most systems
> could hope to accomplish.
>
>

>
> With 4 2MB buffers, you can read 8MB before you have to reuse one of the
> common buffers, so you need to make sure that the copy can complete before
> you have to recycle the first buffer of the set. If you are doing this
> copy in a passive thread, you are at the mercy of the thread scheduler. This
> means there is
substantial latency between the time you determine the
> thread should run and when it actually
does**run; if this latency is
> more than three buffer-fills, you?re dead. You can?t assume zero latency
> to activate a thread (I?ve seen people in user-level apps make this
> assumption, and wonder why their projects fail to work as defined). You
> can?t assume that thread runs 100% of the time, because interrupts and
> DPCs will preempt it, so you can?t assume that even after it starts, it
> completes its action in a predictable amount of time.

>
> ****
>
> This is why most devices with high bandwidth requirements have onboard
> memory and scatter/gather DMA, and even then, you?re pushing the envelope
> with a 2GB frame. But with intermediate staging and explicit memcpy, you
> double or triple the number of memory accesses required over straight
> smart-DMA. Or maybe worse. Before you start trying to make this work,
> you should work out the math to see if can ever be made to work. I once
> participated in evaluating a project whose required bandwidth was something
> like six times the available I/O bandwidth (what bothered ME was the group
> had defined complex communication packets down to the bit level without once
> saying what problem they were trying to solve; a colleague from EE worked
> out the required bandwidth and said ?Hey guys, ?? and the project
> disappeared without a trace three days later).
>
>

>
> I?d like to see some performance numbers, or at least what you can say
> without violating proprietary information, such as bytes/frame, frames/sec,
> etc.

>
> joe
>
>

>
>

> ------------------------------
>
> From: xxxxx@lists.osr.com [mailto:
> xxxxx@lists.osr.com] On Behalf Of ???
> Sent: Tuesday, July 12, 2011 11:16 AM
>
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] 0xc0000005 (Access Violation) in
> MmProbeAndLockPages

>
> ****
>
> Thank you for your reply Mr. Joseph
>
> I could get some hints that I can execute it because of your reply.

>
> (Trying to explain user buffer with Scatter Gather DMA using PartialMdl?)

>
>
>

>
> First of all, Let me explain about my situation.
>
>

>
> I designed PCI Express I/F Card Interface card 3 years ago with FPGA .

>
> And this board doesn’t have any memory (so that I’m facing this problem.)

>

>
>
>
> I also wrote WDM driver and I transfered data with common buffer DMA mode.
>

>
> (because target system had huge memory)

>
> In that system I initialized DMA Buffers in StartDevice of PNP routine
> (using AllocateCommonBuffer)

>
> (In my poor memory, that was 2 buffers of 1MBytes(Transfer Data Buffer) and
> 1 buffer for 4kBytes(command I/F))
>
> I also can achieve Physical Memory Address & Virtual Address so that I can
> copy between FPGA & driver.

>
>
>
> At that time I copied in my dispatch routine from Common buffer to user
> memory

>
> (IoAllocateMdl-> MmProbeAndLockPages-> MmGetSystemAddressForMdlSafe->
> memcpy)
>
> I thought That I could use this method can be used in System Thread also
> because I used it in Dispatch Routine before.

>
>
>
> Now, Let’s talk about my current system.

>
> I initialized 4 transfer data buffers of 2MBytes and 1 buffer for 4kBytes
> with CommonBufferDMA.
>
> Because of New system doesn’t have any memory, So Target System just push
> all Line Scan Camera Images to my Interface Card like streaming data.

>
> I have to push all datas also because My board doesn’t have any memory.
>
>
>

>
> That’s why I deciede to use PCI Express interrupt to notify driver that
> buffer has been filled,
>
> to transfer each CommonBuffers to user buffers

>
> (I think this is fastest way to implement for me not to modify my current
> FPGA design that much)
>
>

>
> And Because of this design is based on Line scan camera Grabber, Transfer
> length will be decided by the size of LCD Glass.

>
> (that’s why transfer size can be expanded more than 2GB)
**
>
> It’s midnight in here.
>
> So, I’ll try tomorrow morning to get scatter gather list.

>
> I think after I get system addresses of this user buffer I think I can copy
> camera datas(in CommonBuffer) to System address of user buffer in System
> Thread.
>
>

>
> Is there anything wrong?
>
> Please give me any comment

>
> Any comments would be helpful.
>
> thank you for your reading.

>
>
>
>

>
>
>
>
> 2011/7/12 Joseph M. Newcomer

>
> See below…
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of
> xxxxx@gmail.com
> Sent: Tuesday, July 12, 2011 6:34 AM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] 0xc0000005 (Access Violation) in MmProbeAndLockPages *
>
>
> Hi, Nice to meet you everyone.
>
> I’m having a problem
>
> I Created user memory Buffer 256MB using VirtualAlloc function.
> and I stored this virtual address in device extension structure.
>

>
>
>

> That is wrong. If you use Direct I/O, and the MDLAddress field will hold
> the MDL that defines this user buffer. Or, you use “mode neither”, and use
> the user-mode address to create a MDL which you must allocate in your
> top-level dispatch routine. After that, you build partial MDLs for
> transferring segments.
>
>
> The user buffer address is totally useless to you in your DPC or a system
> thread. It is valid only in the top-level dispatch routine.
>
>
>
> And I let my PCI Express IF Card to transfer my DMA Buffer.
> after that I generated interrupt to notify device driver that transfer has
> done.

>
>
> This cannot work if the address you give it is the user address. You have
> not said if you are trying to use the user-mode address for your DMA or you
> are using the internal address of a buffer you are allocating, and you do
> not say how you are allocating it, so we don’t know if it is a physically
> contiguous buffer or just a “buffer”, which can be scattered over many
> discontiguous pages, which means you have to deal with scatter/gather DMA.
>

>
>
> in ISR routine I put to dpc queue to let Dpc generate event which will
> wakes
> system thread up.
>
> after system thread routine got this event, this routine allocates MDL with
> IoAllocateMdl
>
>

> What good is this going to do? You are “years” (in computer time) too late
> to do anything with a user-mode address at this point. It has no meaning
> whatsoever in a system thread.
>
>
>
> and after that I tried to Lock Memory with MmProbeAndLockPages
> But after that I got exception with 0xc0000005.

>
> ****
> First, if you do MmProbeAndLockPages you need to do it in the context of an
> exception frame to capture any exceptions so you don’t get a BSOD. Note
> that if you get an exception, it means your locking failed, and you cannot
> complete the I/O. Note also that MmProbeAndLockPages is probably going to
> fail because you are not using the MDL in the IRP (if direct I/O) or the
> MDL
> you allocated in your top-level dispatch routine (if using “mode neither”,
> but one you have “allocated” (and you do not tell use how you got the
> parameters for allocating it!) and tried to initialize by using some
> nonsense random number (aka “the user-mode address”) in a context in which
> this address is pretty much guaranteed to be complete nonsense.
>

>
>
> I want to know what is causing this problem.
>
>

>
> ****
> You are not using Direct I/O would be a good first guess. Allocating your
> own MDL in a context in which the user address is a meaningless random
> number is a likely contributor. Storing a user-mode address is the most
> fatal aspect I can identify. Doing an MmProbeAndLockPages without an
> exception frame is the direct cause.
>
> You must allocate the MDL in your top-level entry routine (e.g., the
> handler
> for IRP_MJ_READ or IRP_MJ_DEVICE_CONTROL, depending on how you are reading
> the data) so that it has the correct information in it. Then you will use
> “mode neither” so the I/O manager doesn’t try to lock all the pages down.
> You can then do DMA into pieces of the MDL using a “partial MDL” which you
> can create from the original MDL
>
>
>
> And I want to know is there any methods that I can transfer streaming data
> from PCI Express card to User Buffer?
>
>

>
>
> Handle your MDL in the top-level dispatch routine
>

>
>
> I think I have to allocate Mdl each interrupts in this system thread
> because
> I have to transfer over than 2GBytes of memory. Because I can’t allocate
> Mdl
> & Physical DMA Buffer like that much.
>
>

> It is far more complex than you think. You just said that the user
> allocates 256MB of memory, so how did you suddenly get this 2GB number?
> You
> don’t need to transfer, by your specification, more than 256MB of memory.
> You seem to feel a need to transfer it in pieces, which suggests the card
> has a seriously bad design (no scatter/gather capability), but there are
> ways of dealing with that.
>
> How did you jump from 256MB to 2GB?
> ****
>
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

>
>
> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
> other seminars visit: http://www.osr.com/seminars To unsubscribe, visit
> the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer ****
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Àå°æ¼· wrote:

I could successfully transfer data from My interface Card to PC using
2MB of 4 buffers & system thread routines
My Interface Card has 4 lane of PCI Express Endpoint Gen 1 which can
transfer about 800MB/s maximum (Bandwidth : 1GB/s)

A 4-lane card can sustain about 600 MB/s. Bus overhead consumes about
40% of PCIExpress bandwidth.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.