Quick Q on GlobalAlloc

If I use GlobalAlloc(GPTR, N) to allocate memory in my app, I can write to that memory in
my driver with no problem, right ?

TIA
/R

As long as you follow the rules for the transfer type (i.e. METHOD_BUFFERED,
METHOD_IN_DIRECT, etc) it will not matter how you got the memory if the app
can access it the driver can.


Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Robert Bielik” wrote in message
news:xxxxx@ntdev…
> If I use GlobalAlloc(GPTR, N) to allocate memory in my app, I can write to
> that memory in
> my driver with no problem, right ?
>
> TIA
> /R
>
>
> Information from ESET NOD32 Antivirus, version of virus
> signature database 5368 (20100815)

>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>

Information from ESET NOD32 Antivirus, version of virus signature database 5368 (20100815)

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Adding to Don’s “it does not matter where it came from” comment - the
“Global” in GlobalAlloc() is an artifact of the Win16 API from eons ago and
frankly means nothing. There is nothing ‘global’ about it.

Good Luck,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
Sent: Sunday, August 15, 2010 5:30 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Quick Q on GlobalAlloc

As long as you follow the rules for the transfer type (i.e. METHOD_BUFFERED,

METHOD_IN_DIRECT, etc) it will not matter how you got the memory if the app

can access it the driver can.


Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Robert Bielik” wrote in message
news:xxxxx@ntdev…
> If I use GlobalAlloc(GPTR, N) to allocate memory in my app, I can write to

> that memory in
> my driver with no problem, right ?
>
> TIA
> /R
>
>
> Information from ESET NOD32 Antivirus, version of virus
> signature database 5368 (20100815)

>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>

Information from ESET NOD32 Antivirus, version of virus signature
database 5368 (20100815)


The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Don Burn skrev:

As long as you follow the rules for the transfer type (i.e. METHOD_BUFFERED,
METHOD_IN_DIRECT, etc) it will not matter how you got the memory if the app
can access it the driver can.

Ok, I’m going to use the case where the driver allocates the memory acc. to article
http://www.osronline.com/article.cfm?article=39.

However, since I’m using a WaveCyclic miniport driver, how do I override the portcls impl
of IRP_MJ_DEVICE_CONTROL ? It seems as though the default implementation only allows
for IOCTL_KS_PROPERTY (via automation tables) which is METHOD_NEITHER and I need
METHOD_OUT_DIRECT…

See, there’s a newbie for ya…

TIA
/R

Why do you think you need to share the memory? This is something that is
easy to get wrong, and can have a lot of synchronization issues. Whenever
a newbie is using this, you have to wonder why?


Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Robert Bielik” wrote in message
news:xxxxx@ntdev…
> Don Burn skrev:
>> As long as you follow the rules for the transfer type (i.e.
>> METHOD_BUFFERED, METHOD_IN_DIRECT, etc) it will not matter how you got
>> the memory if the app can access it the driver can.
>
> Ok, I’m going to use the case where the driver allocates the memory acc.
> to article
> http://www.osronline.com/article.cfm?article=39.
>
> However, since I’m using a WaveCyclic miniport driver, how do I override
> the portcls impl
> of IRP_MJ_DEVICE_CONTROL ? It seems as though the default implementation
> only allows
> for IOCTL_KS_PROPERTY (via automation tables) which is METHOD_NEITHER and
> I need
> METHOD_OUT_DIRECT…
>
> See, there’s a newbie for ya…
>
> TIA
> /R
>
>
>
>
>
> Information from ESET NOD32 Antivirus, version of virus
> signature database 5369 (20100816)

>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>

Information from ESET NOD32 Antivirus, version of virus signature database 5369 (20100816)

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Don Burn skrev:

Why do you think you need to share the memory? This is something that is
easy to get wrong, and can have a lot of synchronization issues. Whenever
a newbie is using this, you have to wonder why?

Hmm… I understand your suspicion :slight_smile:

Ok, here goes. It’s an experiment for using a simple lock-free ring buffer for audio, to be shared with the application without the need to go via IOCTLs in the real-time thread (think WaveRT). I’ve done this stuff for sharing between threads before so I’m quite accustomed to synchronization (even multicore), but now I need to share between driver/appl, but since its lock-free (Single writer, single reader) I don’t anticipate any problems there. I’m not sure if I’m really gonna gain anything with this, but I feel I need to try it out to know, one way or another :slight_smile:

As far as I can tell, the OSR article seems pretty straightforward. And I guess I can remap the IRP_MJ_DEVICE_CONTROL after calling PcInitializeAdapterDriver (and set DO_DIRECT_IO bit in DriverObject->Flags), then pass on all IOCTL requests I’m not interested in to the original handler.

But I’m not sure where to store the old handler ptr ?

TIA
/Rob

Robert Bielik skrev:

But I’m not sure where to store the old handler ptr ?

Aha, ok. Got it. PcDispatchIrp.

/R

do_direct_io changes the buffer encoding for read and write irps. Ioctls have the buffer encoding built into the ioctl value itself.

d

dent from a phpne with no keynoard

-----Original Message-----
From: Robert Bielik
Sent: August 16, 2010 5:32 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Quick Q on GlobalAlloc

Don Burn skrev:
> Why do you think you need to share the memory? This is something that is
> easy to get wrong, and can have a lot of synchronization issues. Whenever
> a newbie is using this, you have to wonder why?

Hmm… I understand your suspicion :slight_smile:

Ok, here goes. It’s an experiment for using a simple lock-free ring buffer for audio, to be shared with the application without the need to go via IOCTLs in the real-time thread (think WaveRT). I’ve done this stuff for sharing between threads before so I’m quite accustomed to synchronization (even multicore), but now I need to share between driver/appl, but since its lock-free (Single writer, single reader) I don’t anticipate any problems there. I’m not sure if I’m really gonna gain anything with this, but I feel I need to try it out to know, one way or another :slight_smile:

As far as I can tell, the OSR article seems pretty straightforward. And I guess I can remap the IRP_MJ_DEVICE_CONTROL after calling PcInitializeAdapterDriver (and set DO_DIRECT_IO bit in DriverObject->Flags), then pass on all IOCTL requests I’m not interested in to the original handler.

But I’m not sure where to store the old handler ptr ?

TIA
/Rob


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Doron Holan skrev:

do_direct_io changes the buffer encoding for read and write irps. Ioctls have the buffer encoding built into the ioctl value itself.

Thanks. But I should still be able to pass a pointer to the allocated memory from a METHOD_NEITHER IOCTL handler (f.i. IOCTL_KS_PROPERTY). That is essentially what KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION does, i.e. if I pass a KSRTAUDIO_BUFFER ptr in the output parameter of DeviceIoControl I get it filled with the info. I’ll go down that path to begin with and not mess with my own IOCTL handler.

/Rob

You can use either METHOD_NEITHER or METHOD_DIRECT_XXX to do that. With the
latter simply pass the buffer pointer in IN\OUT part of the IOCTL
parameters. If that does not suffice then you can shove a buffer pointer in
the INPUT_BUFFER parameters. However, before you leave PASSIVE level it
behooves you to probe and lock ANY buffer you pass that is out of the
ordinary.

Gary G. Little
H (952) 223-1349
C (952) 454-4629
xxxxx@comcast.net

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Robert Bielik
Sent: Monday, August 16, 2010 11:23 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Quick Q on GlobalAlloc

Doron Holan skrev:

do_direct_io changes the buffer encoding for read and write irps. Ioctls
have the buffer encoding built into the ioctl value itself.

Thanks. But I should still be able to pass a pointer to the allocated memory
from a METHOD_NEITHER IOCTL handler (f.i. IOCTL_KS_PROPERTY). That is
essentially what KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION does, i.e. if I
pass a KSRTAUDIO_BUFFER ptr in the output parameter of DeviceIoControl I get
it filled with the info. I’ll go down that path to begin with and not mess
with my own IOCTL handler.

/Rob


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Gary G. Little skrev:

You can use either METHOD_NEITHER or METHOD_DIRECT_XXX to do that. With the
latter simply pass the buffer pointer in IN\OUT part of the IOCTL
parameters. If that does not suffice then you can shove a buffer pointer in
the INPUT_BUFFER parameters. However, before you leave PASSIVE level it
behooves you to probe and lock ANY buffer you pass that is out of the
ordinary.

Behooves. Never heared that before, I like it :slight_smile: Ok then, I know that the PCPROPERTY_REQUEST Value and ValueSize
fields map to DeviceIoControl input parameters, but the output parameters of DeviceIoControl, where do I
get hold of them in the property handler ? See… newbie again… :frowning:

TIA
/Rob

Robert Bielik skrev:

Behooves. Never heared that before, I like it :slight_smile: Ok then, I know that
the PCPROPERTY_REQUEST Value and ValueSize
fields map to DeviceIoControl input parameters, but the output
parameters of DeviceIoControl, where do I
get hold of them in the property handler ? See… newbie again… :frowning:

Ok, now I’ve managed to create shared memory (acc. to http://www.osronline.com/article.cfm?id=39), and the user mode app can read from it. But when trying to use the memory in the driver (out of the process context) I bluescreen each time.

I need to be able to read/write to the memory in any process context and at IRQL == DISPATCH_LEVEL, the article only uses the memory within the context of the process that created it…

Getting a bit frustrated… :frowning:

Would appreciate help immensely…

TIA
/Rob

Also, with the miniscule data rates of audio, it doesn’t make sense to use fancy locking schemes instead of tried and true mutexes. Especially if your lock requires cooperation between UM and KM. Remember, that UM can’t be trusted to produce and keep consistent state. Kernel data state should be completely isolated from user mode.

If you want to implement a shared ring buffer between UM app and KM driver, why not just feed the buffers linearly using ioctls. You’ll save yourself a lot of grief. You’re not gaining anything on such low throughput rates.

If buffers do not reach the hardware in time this has audible consequences
as clicks and pops. Even though throughput is low, the extra latency
involved may have unacceptable consequences. This is not like most other
devices in the system where only performance matters, even for video it’s
acceptable frames are not update in a timely manner. Not knowing much about
the audio stack, this is not to say buffered IOCTLs are necessarily a bad
idea but I do know it’s common for them to make use of shared memory.

//Daniel

wrote in message news:xxxxx@ntdev…
> Also, with the miniscule data rates of audio, it doesn’t make sense to use
> fancy locking schemes instead of tried and true mutexes. Especially if
> your lock requires cooperation between UM and KM. Remember, that UM can’t
> be trusted to produce and keep consistent state. Kernel data state should
> be completely isolated from user mode.
>
> If you want to implement a shared ring buffer between UM app and KM
> driver, why not just feed the buffers linearly using ioctls. You’ll save
> yourself a lot of grief. You’re not gaining anything on such low
> throughput rates.
>
>

I solved a problem like this years ago just by opening the device for
overlapped I/O and doing a bunch of I/O operations. The nature of the
existing driver was that it put a KeQueryPerformanceCounter timestamp in the
first 64 bits of the buffer. The data then followed. Here was the problem:
with a while(true) ReadFile loop, the timestamps had a “jitter” measured in
hundreds of microseconds. This was completely unacceptable. The choices
were (a) rewrite the driver (b) rewrite the app. With a few hours’ work, I
rewrote the app to do
for(int i = 0; i < limit; i++)
ReadFile(…);

where limit was set by a user control. When limit was 40, we had bad
jitter, in the unacceptable range. When the limit was 50, we had a fairly
constant 80usec error. That is, two consecutive timestamps T0 and T1 should
have been T1 = T0 + timeof(data) for the data returned in the buffer (the
data rate was known) but we had T1 = t0 + timeof(data) + 80usec, which the
customer deemed quite acceptable.

Moral: don’t try to fix performance issues by baroque and questionable
driver architectures when you can trivially solve them in the application.

Thinking that ReadFile/WriteFile/DeviceIoControl are the interface to a
driver is the first step towards madness. If necessary, provide a library,
such as a DLL, that does all the interface; then it doesn’t matter WHAT you
make the driver calls look like (weird-looking DeviceIoControl calls? Never
mind, the user never sees it, so what does it matter?) Compare the serial
port interface of Windows with the 40+ DeviceIoControl operations specified
for serial ports, as an example.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@resplendence.com
Sent: Wednesday, August 18, 2010 4:10 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Quick Q on GlobalAlloc

If buffers do not reach the hardware in time this has audible consequences
as clicks and pops. Even though throughput is low, the extra latency
involved may have unacceptable consequences. This is not like most other
devices in the system where only performance matters, even for video it’s
acceptable frames are not update in a timely manner. Not knowing much about
the audio stack, this is not to say buffered IOCTLs are necessarily a bad
idea but I do know it’s common for them to make use of shared memory.

//Daniel

wrote in message news:xxxxx@ntdev…
> Also, with the miniscule data rates of audio, it doesn’t make sense to use

> fancy locking schemes instead of tried and true mutexes. Especially if
> your lock requires cooperation between UM and KM. Remember, that UM can’t
> be trusted to produce and keep consistent state. Kernel data state should
> be completely isolated from user mode.
>
> If you want to implement a shared ring buffer between UM app and KM
> driver, why not just feed the buffers linearly using ioctls. You’ll save
> yourself a lot of grief. You’re not gaining anything on such low
> throughput rates.
>
>


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

"If buffers do not reach the hardware in time this has audible consequences
as clicks and pops. "

And what if the application was not scheduled in time? Or its stack was paged out? One has to keep certain amount of data ahead anyway, for certain probability that you won’t underflow the pipe. The shared ring buffer discussed here doesn’t solve the problem at all, but only adds unnecessary complexity. It’s the same sort of fallacy as moving calculations to kernel, hoping for better throughput.
Using an IRP adds minute amount of latency. A thread on a modern processor can easily send 100000 IRPs per second. Given that the thread may get hit with paging at any time, which may add at least 10 ms, it doesn’t make sense to send data more often than once in 5 ms.

Hitting a hard pagefault is definitely not done when dealing with audio at
any level. You cannot offer customers the latency of a pipe organ where a
mechanism pushes the air through the pipes plus the time required for the
air to travel to your ear from the corner of the church. Customers will
want an acceptable latency for live playing and hanging out the dj. Good
sound cards running on selected hardware can support buffer sizes as low as
32 frames without dropping out. With a samplerate of say 96Khz that means
3000 buffers which need to be delivered in time per second and an effective
latency of 333 us. With DPCs, ISRs, paging and everything else going on a
non RTOS, that shows where the challenge lies.

//Daniel

wrote in message news:xxxxx@ntdev…
> "If buffers do not reach the hardware in time this has audible
> consequences
> as clicks and pops. "
>
> And what if the application was not scheduled in time? Or its stack was
> paged out? One has to keep certain amount of data ahead anyway, for
> certain probability that you won’t underflow the pipe. The shared ring
> buffer discussed here doesn’t solve the problem at all, but only adds
> unnecessary complexity. It’s the same sort of fallacy as moving
> calculations to kernel, hoping for better throughput.
> Using an IRP adds minute amount of latency. A thread on a modern processor
> can easily send 100000 IRPs per second. Given that the thread may get hit
> with paging at any time, which may add at least 10 ms, it doesn’t make
> sense to send data more often than once in 5 ms.
>

xxxxx@broadcom.com wrote:

And what if the application was not scheduled in time? Or its stack was paged out? One has to keep certain amount of data ahead anyway, for certain probability that you won’t underflow the pipe. The shared ring buffer discussed here doesn’t solve the problem at all, but only adds unnecessary complexity. It’s the same sort of fallacy as moving calculations to kernel, hoping for better throughput.
Using an IRP adds minute amount of latency. A thread on a modern processor can easily send 100000 IRPs per second. Given that the thread may get hit with paging at any time, which may add at least 10 ms, it doesn’t make sense to send data more often than once in 5 ms.

There are MANY people in professional audio who disagree violently with
your conclusion. Consider, for example, a MIDI keyboard plugged into a
PC which is then plugged into a speaker. If you have 10ms latency in
the MIDI input path and 10ms latency in the output path, the keyboard
will be almost completely unusable. 20ms of latency between keypress
and audible response would literally drive sophisticated musicians crazy.

That’s a big reason why the default latency was changed from 10ms to 4ms
in Vista, and even that is too long for the ultra-loony audiophiles.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim,

My point was: one should not be concerned about microseconds when writing this low throughput stuff.

You can have little latency but high probability that external events (paging) will disrupt your pipe, or higher latency and high robustness.
Windows multimedia seems to prefer short pipe even in media playback scenarios. Then you get a case when you cannot reliably play a 1.5 Mbps file over a 802.11 connection, or when you play a 25 Mbps DV AVI, it will be disrupted by file copy operatons.

Midi is a partially wrong example. Midi keyboard doesn’t give you a stream that you have to buffer. It gives you events, which you then feed immediately into a software synthesizer. One MIDI message takes a bit under 1 ms over the wire. A chord with all fingers will take 6-10 ms or so. Apparently, this is acceptable.
A syntesyzer, one one hand, has to have as little data in the pipe as possible, to reduce that latency, but on another hand, it needs to be sure that any system latencies won’t cause sound disruption. Pick your poison. Or make everything non-pageble and running on high priority.

I worked with MIDI on 8088 (!) machines, and meeting realtime requirements
was somewhat challenging on those machines. But overall, it wasn’t hard;
now that machines are more than a thousand times faster, I’m not sure it
really poses serious problems.

One problem in MIDI is the “MIDI bandwidth” problem which is classically
that you cannot play a 50-instrument piece because the MIDI bandwidth is the
limiting factor; if you want a bunch of instruments to sound all at once,
you can’t get enough bits out fast enough to trigger them all (MIDI is about
32K bits/sec). However, there’s another problem: the ADSR envelope of an
instrument (attack-decay-sustain-release). For a slow-attack instrument,
like a tuba, a good tuba player starts blowing the note shortly before it is
required, so that it hits its peak sound at the time when it is required.
MIDI software typically ignores the ADSR envelope, and fires the note at a
time T, so the instrument sounds “late”. These two features interact to
produce “mush” when you expect to hear a set of instruments sounding all at
the same time. Latency in the code rarely was a problem (our biggest
problem was getting the score to follow the play instead of falling behind
due to drawing time, and a few architecture changes fixed this).

I’ve played pipe organs where the delay was almost one beat, which was
nearly impossible to handle for someone who doesn’t read music and
improvises based on what is currently sounding. I became completely
“dyslexic” on that instrument. 10ms or 20ms are not serious delays as long
as they are consistent; a piano has longer delays.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
Sent: Thursday, August 19, 2010 4:38 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Quick Q on GlobalAlloc

Tim,

My point was: one should not be concerned about microseconds when writing
this low throughput stuff.

You can have little latency but high probability that external events
(paging) will disrupt your pipe, or higher latency and high robustness.
Windows multimedia seems to prefer short pipe even in media playback
scenarios. Then you get a case when you cannot reliably play a 1.5 Mbps file
over a 802.11 connection, or when you play a 25 Mbps DV AVI, it will be
disrupted by file copy operatons.

Midi is a partially wrong example. Midi keyboard doesn’t give you a stream
that you have to buffer. It gives you events, which you then feed
immediately into a software synthesizer. One MIDI message takes a bit under
1 ms over the wire. A chord with all fingers will take 6-10 ms or so.
Apparently, this is acceptable.
A syntesyzer, one one hand, has to have as little data in the pipe as
possible, to reduce that latency, but on another hand, it needs to be sure
that any system latencies won’t cause sound disruption. Pick your poison. Or
make everything non-pageble and running on high priority.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.