DMA latency

Complete newbie here. I’m working with a modified reference design for a PCI-e Xilinx eval board which uses DMA to transfer data from the board’s FIFO to main memory. I can monitor the size of the FIFO and I see that it is empty most of the time i.e. the DMA operations are successfully clearing out the FIFO in good time.

Unfortunately, every so often the DMA loop is being delayed and the FIFO is overflowing. I need to find out what’s causing this interference. I’ve disabled as much hardware and software as I can but no luck. Two questions:

  1. I installed LatencyMon.exe to see what might be causing the interference and I can see that on an idle system, ataport.sys is handling a lot of interrupts and dpc’s BUT there are no hard page faults and there are no applications running. What could be causing this activity?

  2. I’ve noticed that I get much less interference when I minimise the command window displaying application debug output, even though there is no debug output during the DMA loop itself. Am I correct in assuming that Windows would not write to the graphics chip (x4500) when there is no change to the desktop screen?

  3. Does windows use DMAs to communicate screen updates to the graphics chip (that might interfere with my application DMAs)?

  4. Does the graphics chip use DMAs to access it’s frame buffer (which is using system memory) during a screen refresh?

Thanks.

You need to be more specific on your design.
Do you use an interrupt? On which IRQL does run your DMA loop?

Igor Sharovar

xxxxx@eircom.net wrote:

Complete newbie here. I’m working with a modified reference design for a PCI-e Xilinx eval board which uses DMA to transfer data from the board’s FIFO to main memory. I can monitor the size of the FIFO and I see that it is empty most of the time i.e. the DMA operations are successfully clearing out the FIFO in good time.

Unfortunately, every so often the DMA loop is being delayed and the FIFO is overflowing. I need to find out what’s causing this interference. I’ve disabled as much hardware and software as I can but no luck.

How large is your FIFO? That will tell us how much delay you can
tolerate. Did you write the driver? Do you have to update the DMA
addresses at interrupt time, or does the thing run continuously once
launched?

  1. I installed LatencyMon.exe to see what might be causing the interference and I can see that on an idle system, ataport.sys is handling a lot of interrupts and dpc’s BUT there are no hard page faults and there are no applications running. What could be causing this activity?

There’s always stuff going on in the background. Windows might be
downloading updates. iTunes, Acrobat, and Java all download updates in
the background. Windows Search does its indexing in the background.

  1. I’ve noticed that I get much less interference when I minimise the command window displaying application debug output, even though there is no debug output during the DMA loop itself. Am I correct in assuming that Windows would not write to the graphics chip (x4500) when there is no change to the desktop screen?

The X4500 is a UMA device – the frame buffer is part of main memory.
That means not only do graphic drawing operations use up memory and bus
bandwidth, but screen refresh uses continuous memory bandwidth. The
image you see on your monitor is not a snapshot that gets updated only
when something changes. The graphics processor has to send a continuous
stream of pixels to the monitor. That means it has to read the entire
contents of your frame buffer 70 times a second. For a 1920x1080 true
color desktop, that’s 600 MB/sec.

  1. Does windows use DMAs to communicate screen updates to the graphics chip (that might interfere with my application DMAs)?

Remember that you don’t really have a “graphics chip”. The graphics
processor core is built in to the north bridge chipset. It doesn’t live
on a PCI bus – it is accessed directly through the front-side bus, and
it has direct access to the memory bus. You will certainly compete with
the graphics chip for access to memory.

  1. Does the graphics chip use DMAs to access it’s frame buffer (which is using system memory) during a screen refresh?

Well, it’s not DMA in the sense that it doesn’t travel over PCIExpress,
but it certainly uses memory bandwidth and ties up the front-side bus.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Well, you didn’t give a lot of detail, but generally when you are doing
DMA you are using one of the “direct” modes thst gives you a MDL to the
user memory. This means you have to have an active IRP to do a transfer.
Now consider the loop

while(true)
{
ReadFile(…);
…do something with the data…
}

(loop termination and error checking left as an Exercise For The Reader)

Note that part that says “do something with the data”. If that takes any
amount of time, there is a good chance the loop may not complete before
the thread is pre-empted by the scheduler, and this is the delay you are
seeing.

There are a couple solutions to this. One is to put the loop in a
separate thread, and use interthread queueing to asynchronously post a
message to the thread that does the processing. Another is to use
asynchronous I/O and issue a whole whopping lot of read requests; the swap
time in the kernel to get the next IRP is very low. One downside to this
is that you typically want to use an I/O Completion Port (IOCP) to handle
the completion notifications, and it has already been pointed out in this
forum that completion request order in the IOCP is not guaranteed to be in
issuing order.

Another delay can happen when the I/O Manager does the
MmProbeAndLockPages; if it takes page faults, then you get an
indeterminate latency while the pages come in. You can help this by using
VirtualLock to lock pages down, but I believe this API requires the
sysadmin to grant rights to do this, so it is not a palatable solution
unless you are in a vertical market context where you can do this during
the system installation.

What priority boost do you assign when you complete the IRP? Note that
and ordinary process can’t have a thread >15, which means you are still
subject to preemption by higher priority threads (typically, kernel
threads and most often file system threads). If younare running Vista or
higher, look into the multimedia sceduling service APIs, which let an
ordinary process with time-critical constraints run up to priority 26 or
27 (I forget which, and I currently have no access to the docs because I
am not in my office, with a wealth of docs at my fingertips).

The problem *might* be unsolvable, but as you can see from the above,
there’s a ton of options you haven’t explored.

BTW, an onboard FIFO sould be sized to hold at least one second of data.
Windows is not an ideal platform for devices requiring low latency.
joe

Complete newbie here. I’m working with a modified reference design for a
PCI-e Xilinx eval board which uses DMA to transfer data from the board’s
FIFO to main memory. I can monitor the size of the FIFO and I see that it
is empty most of the time i.e. the DMA operations are successfully
clearing out the FIFO in good time.

Unfortunately, every so often the DMA loop is being delayed and the FIFO
is overflowing. I need to find out what’s causing this interference. I’ve
disabled as much hardware and software as I can but no luck. Two
questions:

  1. I installed LatencyMon.exe to see what might be causing the
    interference and I can see that on an idle system, ataport.sys is handling
    a lot of interrupts and dpc’s BUT there are no hard page faults and there
    are no applications running. What could be causing this activity?

  2. I’ve noticed that I get much less interference when I minimise the
    command window displaying application debug output, even though there is
    no debug output during the DMA loop itself. Am I correct in assuming that
    Windows would not write to the graphics chip (x4500) when there is no
    change to the desktop screen?

  3. Does windows use DMAs to communicate screen updates to the graphics
    chip (that might interfere with my application DMAs)?

  4. Does the graphics chip use DMAs to access it’s frame buffer (which is
    using system memory) during a screen refresh?

Thanks.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:

Complete newbie here. I’m working with a modified reference design for a PCI-e
Xilinx eval board which uses DMA to transfer data from the board’s FIFO to main
memory. I can monitor the size of the FIFO and I see that it is empty most of
the time i.e. the DMA operations are successfully clearing out the FIFO in good
time.

Unfortunately, every so often the DMA loop is being delayed and the FIFO is
overflowing. I need to find out what’s causing this interference. I’ve disabled
as much hardware and software as I can but no luck. Two questions:

Russel, what platform are you using (Core <i_something>, Atom…). With all
Intel processors one thing is very important, make sure your transfers are a
multiple of 64 Byte and are aligned to 64-byte boundaries. I would recommend
setting the driver alignment requirements to 64 byte, or some multiple thereof.
(64 byte is the cache line size). Observing this rule works wonders. On an Atom
application it gave me a performance increase of over 30%. It’s also in the IA
manuals somewhere, unfortunately I only found the note myself after I had
experimented for ages :wink:

Also, when writing to memory, try to always transfer <max_payload_size> blocks.
This depends on your chip-set but is typically 128 byte or 256 byte. This gives
you the best usage of available credits.

If you are using Intel Atom, you may have to disable deep-sleep S6 states. This
can usually be done in the BIOS. I had a major issue with the credit update
(UpdateFC) frequency here which caused a problem very like the one you described

Slan,
Charles

> 1. I installed LatencyMon.exe to see what might be causing the interference and
> I can see that on an idle system, ataport.sys is handling a lot of interrupts
> and dpc’s BUT there are no hard page faults and there are no applications
> running. What could be causing this activity?
>
> 2. I’ve noticed that I get much less interference when I minimise the command
> window displaying application debug output, even though there is no debug
> output during the DMA loop itself. Am I correct in assuming that Windows would
> not write to the graphics chip (x4500) when there is no change to the desktop
> screen?
>
> 3. Does windows use DMAs to communicate screen updates to the graphics chip
> (that might interfere with my application DMAs)?
>
> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is using
> system memory) during a screen refresh?
>
> Thanks.
></max_payload_size></i_something>

Note that minimizing windows tends to tell the scheduler that the process
is not terribly important, and its pages are candidates for page-out.
Thus, its working set requirement is reduced, leaving more pages available
in physical memory. This could reduce the paging behavior of your app,
see my previous posts.

Until you have eliminated all possible causes > 2 orders of magnitude,
there is no point in trying to eke out the last drop of performance
possible. Paging is six to seven orders of magnitude performance hit, per
page fault. For a large buffer, you might take multiple page faults
during MmProbeAndLockPages (hence my reference to eight orders of
magnitude).

It would be useful to have performance data on this device such as:

Input data rate
FIFO buffer size (ideally expressed in units of time of the input data)
Interrupt rate
I/O buffer size on your read request

Then there are the architecture questions:

Do you do internal buffering in your driver? Or do you rely on the MDL in
the IRP?
How much do you do in the ISR vs. the DPC?
What priority boost do you give on IRP completion?
How many instructions does your app execute between I/O read calls?
Note that if there is a kernel call (other than the read request or
an asynchronous inter-thread queue request)
in the loop, you probably need to rewrite the loop.
Note that if the kernel call is graphics-related (including
SendMessage to controls), you need to rewrite the loop.
Putting a simple loop in a separate thread can often help.

I have been solving these problems for about 15 years now. The typical
causes are
Underdesigned hardware
Underdesigned driver
Underdesigned app

That’s it. You can often compensate for an underdesigned driver/hardware
combination by expending more effort on the app. But hardware should be
robust under operating system delays (large FIFO) and the driver needs to
be robust under operating system delays. Using some of the application
tricks I mentioned can help make the whole thing less sensitive to the
hardware/driver issue, and since I’m primarily an application-level
programmer this is where I put all my effort, and have several notable
successes, and no failures yet.

And yes, to get one of those successes I had to play with scheduler
priorities and thread affinity. You can use these *very carefully* with
success, but you don’t start out saying “Well, if I just tweak this thread
priority and change this affinity, all will be well”.
joe

Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
> Complete newbie here. I’m working with a modified reference design for a
> PCI-e
> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
> to main
> memory. I can monitor the size of the FIFO and I see that it is empty
> most of
> the time i.e. the DMA operations are successfully clearing out the FIFO
> in good
> time.
>
> Unfortunately, every so often the DMA loop is being delayed and the FIFO
> is
> overflowing. I need to find out what’s causing this interference. I’ve
> disabled
> as much hardware and software as I can but no luck. Two questions:
>

Russel, what platform are you using (Core <i_something>, Atom…). With
> all
> Intel processors one thing is very important, make sure your transfers are
> a
> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
> recommend
> setting the driver alignment requirements to 64 byte, or some multiple
> thereof.
> (64 byte is the cache line size). Observing this rule works wonders. On an
> Atom
> application it gave me a performance increase of over 30%. It’s also in
> the IA
> manuals somewhere, unfortunately I only found the note myself after I had
> experimented for ages :wink:
>
> Also, when writing to memory, try to always transfer <max_payload_size>
> blocks.
> This depends on your chip-set but is typically 128 byte or 256 byte. This
> gives
> you the best usage of available credits.
>
> If you are using Intel Atom, you may have to disable deep-sleep S6 states.
> This
> can usually be done in the BIOS. I had a major issue with the credit
> update
> (UpdateFC) frequency here which caused a problem very like the one you
> described
>
> Slan,
> Charles
>
>> 1. I installed LatencyMon.exe to see what might be causing the
>> interference and
>> I can see that on an idle system, ataport.sys is handling a lot of
>> interrupts
>> and dpc’s BUT there are no hard page faults and there are no
>> applications
>> running. What could be causing this activity?
>>
>> 2. I’ve noticed that I get much less interference when I minimise the
>> command
>> window displaying application debug output, even though there is no
>> debug
>> output during the DMA loop itself. Am I correct in assuming that Windows
>> would
>> not write to the graphics chip (x4500) when there is no change to the
>> desktop
>> screen?
>>
>> 3. Does windows use DMAs to communicate screen updates to the graphics
>> chip
>> (that might interfere with my application DMAs)?
>>
>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is
>> using
>> system memory) during a screen refresh?
>>
>> Thanks.
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
></max_payload_size></i_something>

Reducing working set on minimize is no longer done on w7, maybe on Vista as well

d

debt from my phone


From: xxxxx@flounder.com
Sent: 10/20/2011 8:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] DMA latency

Note that minimizing windows tends to tell the scheduler that the process
is not terribly important, and its pages are candidates for page-out.
Thus, its working set requirement is reduced, leaving more pages available
in physical memory. This could reduce the paging behavior of your app,
see my previous posts.

Until you have eliminated all possible causes > 2 orders of magnitude,
there is no point in trying to eke out the last drop of performance
possible. Paging is six to seven orders of magnitude performance hit, per
page fault. For a large buffer, you might take multiple page faults
during MmProbeAndLockPages (hence my reference to eight orders of
magnitude).

It would be useful to have performance data on this device such as:

Input data rate
FIFO buffer size (ideally expressed in units of time of the input data)
Interrupt rate
I/O buffer size on your read request

Then there are the architecture questions:

Do you do internal buffering in your driver? Or do you rely on the MDL in
the IRP?
How much do you do in the ISR vs. the DPC?
What priority boost do you give on IRP completion?
How many instructions does your app execute between I/O read calls?
Note that if there is a kernel call (other than the read request or
an asynchronous inter-thread queue request)
in the loop, you probably need to rewrite the loop.
Note that if the kernel call is graphics-related (including
SendMessage to controls), you need to rewrite the loop.
Putting a simple loop in a separate thread can often help.

I have been solving these problems for about 15 years now. The typical
causes are
Underdesigned hardware
Underdesigned driver
Underdesigned app

That’s it. You can often compensate for an underdesigned driver/hardware
combination by expending more effort on the app. But hardware should be
robust under operating system delays (large FIFO) and the driver needs to
be robust under operating system delays. Using some of the application
tricks I mentioned can help make the whole thing less sensitive to the
hardware/driver issue, and since I’m primarily an application-level
programmer this is where I put all my effort, and have several notable
successes, and no failures yet.

And yes, to get one of those successes I had to play with scheduler
priorities and thread affinity. You can use these *very carefully* with
success, but you don’t start out saying “Well, if I just tweak this thread
priority and change this affinity, all will be well”.
joe

Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
> Complete newbie here. I’m working with a modified reference design for a
> PCI-e
> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
> to main
> memory. I can monitor the size of the FIFO and I see that it is empty
> most of
> the time i.e. the DMA operations are successfully clearing out the FIFO
> in good
> time.
>
> Unfortunately, every so often the DMA loop is being delayed and the FIFO
> is
> overflowing. I need to find out what’s causing this interference. I’ve
> disabled
> as much hardware and software as I can but no luck. Two questions:
>

Russel, what platform are you using (Core <i_something>, Atom…). With
> all
> Intel processors one thing is very important, make sure your transfers are
> a
> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
> recommend
> setting the driver alignment requirements to 64 byte, or some multiple
> thereof.
> (64 byte is the cache line size). Observing this rule works wonders. On an
> Atom
> application it gave me a performance increase of over 30%. It’s also in
> the IA
> manuals somewhere, unfortunately I only found the note myself after I had
> experimented for ages :wink:
>
> Also, when writing to memory, try to always transfer <max_payload_size>
> blocks.
> This depends on your chip-set but is typically 128 byte or 256 byte. This
> gives
> you the best usage of available credits.
>
> If you are using Intel Atom, you may have to disable deep-sleep S6 states.
> This
> can usually be done in the BIOS. I had a major issue with the credit
> update
> (UpdateFC) frequency here which caused a problem very like the one you
> described
>
> Slan,
> Charles
>
>> 1. I installed LatencyMon.exe to see what might be causing the
>> interference and
>> I can see that on an idle system, ataport.sys is handling a lot of
>> interrupts
>> and dpc’s BUT there are no hard page faults and there are no
>> applications
>> running. What could be causing this activity?
>>
>> 2. I’ve noticed that I get much less interference when I minimise the
>> command
>> window displaying application debug output, even though there is no
>> debug
>> output during the DMA loop itself. Am I correct in assuming that Windows
>> would
>> not write to the graphics chip (x4500) when there is no change to the
>> desktop
>> screen?
>>
>> 3. Does windows use DMAs to communicate screen updates to the graphics
>> chip
>> (that might interfere with my application DMAs)?
>>
>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is
>> using
>> system memory) during a screen refresh?
>>
>> Thanks.
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</max_payload_size></i_something>

IMHO this used to be a great trick to page out leaked memory in buggy apps and prevent various crashes caused by other buggy code that didn’t check for allocation failure! Too bad the ‘feature’ has been removed (please read this as sarcasm)

“Doron Holan” wrote in message news:xxxxx@ntdev…
Reducing working set on minimize is no longer done on w7, maybe on Vista as well

d

debt from my phone

--------------------------------------------------------------------------------
From: xxxxx@flounder.com
Sent: 10/20/2011 8:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] DMA latency

Note that minimizing windows tends to tell the scheduler that the process
is not terribly important, and its pages are candidates for page-out.
Thus, its working set requirement is reduced, leaving more pages available
in physical memory. This could reduce the paging behavior of your app,
see my previous posts.

Until you have eliminated all possible causes > 2 orders of magnitude,
there is no point in trying to eke out the last drop of performance
possible. Paging is six to seven orders of magnitude performance hit, per
page fault. For a large buffer, you might take multiple page faults
during MmProbeAndLockPages (hence my reference to eight orders of
magnitude).

It would be useful to have performance data on this device such as:

Input data rate
FIFO buffer size (ideally expressed in units of time of the input data)
Interrupt rate
I/O buffer size on your read request

Then there are the architecture questions:

Do you do internal buffering in your driver? Or do you rely on the MDL in
the IRP?
How much do you do in the ISR vs. the DPC?
What priority boost do you give on IRP completion?
How many instructions does your app execute between I/O read calls?
Note that if there is a kernel call (other than the read request or
an asynchronous inter-thread queue request)
in the loop, you probably need to rewrite the loop.
Note that if the kernel call is graphics-related (including
SendMessage to controls), you need to rewrite the loop.
Putting a simple loop in a separate thread can often help.

I have been solving these problems for about 15 years now. The typical
causes are
Underdesigned hardware
Underdesigned driver
Underdesigned app

That’s it. You can often compensate for an underdesigned driver/hardware
combination by expending more effort on the app. But hardware should be
robust under operating system delays (large FIFO) and the driver needs to
be robust under operating system delays. Using some of the application
tricks I mentioned can help make the whole thing less sensitive to the
hardware/driver issue, and since I’m primarily an application-level
programmer this is where I put all my effort, and have several notable
successes, and no failures yet.

And yes, to get one of those successes I had to play with scheduler
priorities and thread affinity. You can use these very carefully with
success, but you don’t start out saying “Well, if I just tweak this thread
priority and change this affinity, all will be well”.
joe

> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>> Complete newbie here. I’m working with a modified reference design for a
>> PCI-e
>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>> to main
>> memory. I can monitor the size of the FIFO and I see that it is empty
>> most of
>> the time i.e. the DMA operations are successfully clearing out the FIFO
>> in good
>> time.
>>
>> Unfortunately, every so often the DMA loop is being delayed and the FIFO
>> is
>> overflowing. I need to find out what’s causing this interference. I’ve
>> disabled
>> as much hardware and software as I can but no luck. Two questions:
>>
>
> Russel, what platform are you using (Core <i_something>, Atom…). With
> all
> Intel processors one thing is very important, make sure your transfers are
> a
> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
> recommend
> setting the driver alignment requirements to 64 byte, or some multiple
> thereof.
> (64 byte is the cache line size). Observing this rule works wonders. On an
> Atom
> application it gave me a performance increase of over 30%. It’s also in
> the IA
> manuals somewhere, unfortunately I only found the note myself after I had
> experimented for ages :wink:
>
> Also, when writing to memory, try to always transfer <max_payload_size>
> blocks.
> This depends on your chip-set but is typically 128 byte or 256 byte. This
> gives
> you the best usage of available credits.
>
> If you are using Intel Atom, you may have to disable deep-sleep S6 states.
> This
> can usually be done in the BIOS. I had a major issue with the credit
> update
> (UpdateFC) frequency here which caused a problem very like the one you
> described
>
> Slan,
> Charles
>
>> 1. I installed LatencyMon.exe to see what might be causing the
>> interference and
>> I can see that on an idle system, ataport.sys is handling a lot of
>> interrupts
>> and dpc’s BUT there are no hard page faults and there are no
>> applications
>> running. What could be causing this activity?
>>
>> 2. I’ve noticed that I get much less interference when I minimise the
>> command
>> window displaying application debug output, even though there is no
>> debug
>> output during the DMA loop itself. Am I correct in assuming that Windows
>> would
>> not write to the graphics chip (x4500) when there is no change to the
>> desktop
>> screen?
>>
>> 3. Does windows use DMAs to communicate screen updates to the graphics
>> chip
>> (that might interfere with my application DMAs)?
>>
>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is
>> using
>> system memory) during a screen refresh?
>>
>> Thanks.
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</max_payload_size></i_something>

This feature may be related to killing the market for the software Mark
Russinovich called “Fraudware”, where these behaviors were used by
programs that claimed to “improve” performance but charged you $39.95 for
something that did just the opposite: it forced everything to page out,
making Task Manager display numbers that looked good, but only to the
naive.

Also, it is not clear, in retrospect, why minimizing the command window in
the debugger (on a host machine). would have much impact on the behavior
of the target; I hadn’t thought that one through. But there was so little
information that it was hard to guess what might be the problem. See my
previous posts.

In my experience, it has always been hardware/driver underdesign that was
the root cause of data overruns, but clever application-level programming
has, in all cases I’ve had, been able to compensate for this. Key here is
to eliminate all multiple-orders-of-magnitude problems before worrying
about factors of two, or improvements of 10%.

There was no good performance data presented. For example, I suspect
latencymon is some kind of filter driver that timestamps the IRP going
down, and which computes the latency in its completion routine. If so,
delays caused by the I/O Manager causing page faults would not be
measured, but could have profound impact on the total trip time of an IRP
from application space. A more useful measure might be the delta-T
between successive IRPs to the device. Without this critical piece of
information, it strikes me as profoundly silly to worry about NUMA
adjacency. Note that this number would account for application time,
scheduler overheads, thread preemption by kernel threads, ISRs and DPCs,
and page fault overheads. I suspect that such an analysis will show some
huge delta-T right before an overrun occurs. Of course, this useful only
if the app does synchronous I/O.

If NUMA latency were on trial, I don’t think you could even get a grand
jury to indict on the evidence presented; should such a miscarriage of
justice occur, the prosecution would be shredded at trial.

To understand a result from a measuring tool, you need to know what it is
measuring, and how accurately it reflects what is going on. For example,
what is the clock skew of different cores when KeQueryPerformanceCounter
is executed? I am not sure there is even a way to answer this question.
But it taints any high-resolution numbers obtained from multicore systems.

I can even visualize the code. Top-level driver creates a timestamp> pair and attaches it via SetCompletionRoutine. Completion
routine gets a pair. I’d also record the
IoStatus.Status value. I’d keep a large ring buffer which the monitoring
program would read-and-clear from time to time, and it would do the data
reduction. Lots of SMOP (Small Matter Of Programming) left as an Exercise
For The Reader. Note that the raw timestamps for both entry and
completion time are kept. Now the data reduction can compute both latency
and inter-IRP times, give you graphs, statistical reliability of the data,
etc. Somtimes I just write out CSV files and I let Excel do all the work.
joe

> IMHO this used to be a great trick to page out leaked memory in buggy apps
> and prevent various crashes caused by other buggy code that didn’t check
> for allocation failure! Too bad the ‘feature’ has been removed
> (please read this as sarcasm)
>
>
> “Doron Holan” wrote in message
> news:xxxxx@ntdev…
> Reducing working set on minimize is no longer done on w7, maybe on Vista
> as well
>
> d
>
> debt from my phone
>
>
> --------------------------------------------------------------------------------
> From: xxxxx@flounder.com
> Sent: 10/20/2011 8:18 AM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] DMA latency
>
>
> Note that minimizing windows tends to tell the scheduler that the process
> is not terribly important, and its pages are candidates for page-out.
> Thus, its working set requirement is reduced, leaving more pages available
> in physical memory. This could reduce the paging behavior of your app,
> see my previous posts.
>
> Until you have eliminated all possible causes > 2 orders of magnitude,
> there is no point in trying to eke out the last drop of performance
> possible. Paging is six to seven orders of magnitude performance hit, per
> page fault. For a large buffer, you might take multiple page faults
> during MmProbeAndLockPages (hence my reference to eight orders of
> magnitude).
>
> It would be useful to have performance data on this device such as:
>
> Input data rate
> FIFO buffer size (ideally expressed in units of time of the input data)
> Interrupt rate
> I/O buffer size on your read request
>
> Then there are the architecture questions:
>
> Do you do internal buffering in your driver? Or do you rely on the MDL in
> the IRP?
> How much do you do in the ISR vs. the DPC?
> What priority boost do you give on IRP completion?
> How many instructions does your app execute between I/O read calls?
> Note that if there is a kernel call (other than the read request or
> an asynchronous inter-thread queue request)
> in the loop, you probably need to rewrite the loop.
> Note that if the kernel call is graphics-related (including
> SendMessage to controls), you need to rewrite the loop.
> Putting a simple loop in a separate thread can often help.
>
> I have been solving these problems for about 15 years now. The typical
> causes are
> Underdesigned hardware
> Underdesigned driver
> Underdesigned app
>
> That’s it. You can often compensate for an underdesigned driver/hardware
> combination by expending more effort on the app. But hardware should be
> robust under operating system delays (large FIFO) and the driver needs to
> be robust under operating system delays. Using some of the application
> tricks I mentioned can help make the whole thing less sensitive to the
> hardware/driver issue, and since I’m primarily an application-level
> programmer this is where I put all my effort, and have several notable
> successes, and no failures yet.
>
> And yes, to get one of those successes I had to play with scheduler
> priorities and thread affinity. You can use these very carefully with
> success, but you don’t start out saying “Well, if I just tweak this thread
> priority and change this affinity, all will be well”.
> joe
>
>
>
>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>> Complete newbie here. I’m working with a modified reference design for
>>> a
>>> PCI-e
>>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>>> to main
>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>> most of
>>> the time i.e. the DMA operations are successfully clearing out the FIFO
>>> in good
>>> time.
>>>
>>> Unfortunately, every so often the DMA loop is being delayed and the
>>> FIFO
>>> is
>>> overflowing. I need to find out what’s causing this interference. I’ve
>>> disabled
>>> as much hardware and software as I can but no luck. Two questions:
>>>
>>
>> Russel, what platform are you using (Core <i_something>, Atom…).
>> With
>> all
>> Intel processors one thing is very important, make sure your transfers
>> are
>> a
>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>> recommend
>> setting the driver alignment requirements to 64 byte, or some multiple
>> thereof.
>> (64 byte is the cache line size). Observing this rule works wonders. On
>> an
>> Atom
>> application it gave me a performance increase of over 30%. It’s also in
>> the IA
>> manuals somewhere, unfortunately I only found the note myself after I
>> had
>> experimented for ages :wink:
>>
>> Also, when writing to memory, try to always transfer <max_payload_size>
>> blocks.
>> This depends on your chip-set but is typically 128 byte or 256 byte.
>> This
>> gives
>> you the best usage of available credits.
>>
>> If you are using Intel Atom, you may have to disable deep-sleep S6
>> states.
>> This
>> can usually be done in the BIOS. I had a major issue with the credit
>> update
>> (UpdateFC) frequency here which caused a problem very like the one you
>> described
>>
>> Slan,
>> Charles
>>
>>> 1. I installed LatencyMon.exe to see what might be causing the
>>> interference and
>>> I can see that on an idle system, ataport.sys is handling a lot of
>>> interrupts
>>> and dpc’s BUT there are no hard page faults and there are no
>>> applications
>>> running. What could be causing this activity?
>>>
>>> 2. I’ve noticed that I get much less interference when I minimise the
>>> command
>>> window displaying application debug output, even though there is no
>>> debug
>>> output during the DMA loop itself. Am I correct in assuming that
>>> Windows
>>> would
>>> not write to the graphics chip (x4500) when there is no change to the
>>> desktop
>>> screen?
>>>
>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>> chip
>>> (that might interfere with my application DMAs)?
>>>
>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which
>>> is
>>> using
>>> system memory) during a screen refresh?
>>>
>>> Thanks.
>>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer</max_payload_size></i_something>

LatencyMon (which is my software) uses a kernel logger session to report ISR
, DPC and hard pagefault executions. It also uses a kernel timer to measure
general latency, that indeed should not include HPFs. It’s not a filter and
does not measure IRPs or anything like that.
Like any performance tool it is also state changing. The many ISRs and DPCs
by ataport that the OP reported may be attributed to the fact that it in
order to get the data out is needs to write and read from an ETW log file.
But that holds for XPERF too.

Daniel Terhell
Resplendence Software Projects Sp
xxxxx@resplendence.com
http://www.resplendence.com

wrote in message news:xxxxx@ntdev…
> There was no good performance data presented. For example, I suspect
> latencymon is some kind of filter driver that timestamps the IRP going
> down, and which computes the latency in its completion routine. If so,
> delays caused by the I/O Manager causing page faults would not be
> measured, but could have profound impact on the total trip time of an IRP
> from application space.

A less cynical rational would be that a decision was made by designers at
Microsoft that with increasing RAM available on users’ computers more than
one program could be resident concurrently and so there was no need to make
users wait for paging every time they pressed alt+tab (The time saving is
even bigger when they switch back a few seconds later)

I suspect that the OP is developing some specialized HW+software as some
kind of advanced research project; the goal of which is to eliminate or
dramatically reduce inter-node traffic so telling him that he is wasting his
time is unproductive. I assume that he is trying to engraft some cell
processing like concepts but finds it difficult because the NT kernel is
fundamentally an SMP design with optimizations for HT & NUMA in specific
circumstances.

wrote in message news:xxxxx@ntdev…

This feature may be related to killing the market for the software Mark
Russinovich called “Fraudware”, where these behaviors were used by
programs that claimed to “improve” performance but charged you $39.95 for
something that did just the opposite: it forced everything to page out,
making Task Manager display numbers that looked good, but only to the
naive.

Also, it is not clear, in retrospect, why minimizing the command window in
the debugger (on a host machine). would have much impact on the behavior
of the target; I hadn’t thought that one through. But there was so little
information that it was hard to guess what might be the problem. See my
previous posts.

In my experience, it has always been hardware/driver underdesign that was
the root cause of data overruns, but clever application-level programming
has, in all cases I’ve had, been able to compensate for this. Key here is
to eliminate all multiple-orders-of-magnitude problems before worrying
about factors of two, or improvements of 10%.

There was no good performance data presented. For example, I suspect
latencymon is some kind of filter driver that timestamps the IRP going
down, and which computes the latency in its completion routine. If so,
delays caused by the I/O Manager causing page faults would not be
measured, but could have profound impact on the total trip time of an IRP
from application space. A more useful measure might be the delta-T
between successive IRPs to the device. Without this critical piece of
information, it strikes me as profoundly silly to worry about NUMA
adjacency. Note that this number would account for application time,
scheduler overheads, thread preemption by kernel threads, ISRs and DPCs,
and page fault overheads. I suspect that such an analysis will show some
huge delta-T right before an overrun occurs. Of course, this useful only
if the app does synchronous I/O.

If NUMA latency were on trial, I don’t think you could even get a grand
jury to indict on the evidence presented; should such a miscarriage of
justice occur, the prosecution would be shredded at trial.

To understand a result from a measuring tool, you need to know what it is
measuring, and how accurately it reflects what is going on. For example,
what is the clock skew of different cores when KeQueryPerformanceCounter
is executed? I am not sure there is even a way to answer this question.
But it taints any high-resolution numbers obtained from multicore systems.

I can even visualize the code. Top-level driver creates a timestamp> pair and attaches it via SetCompletionRoutine. Completion
routine gets a pair. I’d also record the
IoStatus.Status value. I’d keep a large ring buffer which the monitoring
program would read-and-clear from time to time, and it would do the data
reduction. Lots of SMOP (Small Matter Of Programming) left as an Exercise
For The Reader. Note that the raw timestamps for both entry and
completion time are kept. Now the data reduction can compute both latency
and inter-IRP times, give you graphs, statistical reliability of the data,
etc. Somtimes I just write out CSV files and I let Excel do all the work.
joe

> IMHO this used to be a great trick to page out leaked memory in buggy apps
> and prevent various crashes caused by other buggy code that didn’t check
> for allocation failure! Too bad the ‘feature’ has been removed
> (please read this as sarcasm)
>
>
> “Doron Holan” wrote in message
> news:xxxxx@ntdev…
> Reducing working set on minimize is no longer done on w7, maybe on Vista
> as well
>
> d
>
> debt from my phone
>
>
> --------------------------------------------------------------------------------
> From: xxxxx@flounder.com
> Sent: 10/20/2011 8:18 AM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] DMA latency
>
>
> Note that minimizing windows tends to tell the scheduler that the process
> is not terribly important, and its pages are candidates for page-out.
> Thus, its working set requirement is reduced, leaving more pages available
> in physical memory. This could reduce the paging behavior of your app,
> see my previous posts.
>
> Until you have eliminated all possible causes > 2 orders of magnitude,
> there is no point in trying to eke out the last drop of performance
> possible. Paging is six to seven orders of magnitude performance hit, per
> page fault. For a large buffer, you might take multiple page faults
> during MmProbeAndLockPages (hence my reference to eight orders of
> magnitude).
>
> It would be useful to have performance data on this device such as:
>
> Input data rate
> FIFO buffer size (ideally expressed in units of time of the input data)
> Interrupt rate
> I/O buffer size on your read request
>
> Then there are the architecture questions:
>
> Do you do internal buffering in your driver? Or do you rely on the MDL in
> the IRP?
> How much do you do in the ISR vs. the DPC?
> What priority boost do you give on IRP completion?
> How many instructions does your app execute between I/O read calls?
> Note that if there is a kernel call (other than the read request or
> an asynchronous inter-thread queue request)
> in the loop, you probably need to rewrite the loop.
> Note that if the kernel call is graphics-related (including
> SendMessage to controls), you need to rewrite the loop.
> Putting a simple loop in a separate thread can often help.
>
> I have been solving these problems for about 15 years now. The typical
> causes are
> Underdesigned hardware
> Underdesigned driver
> Underdesigned app
>
> That’s it. You can often compensate for an underdesigned driver/hardware
> combination by expending more effort on the app. But hardware should be
> robust under operating system delays (large FIFO) and the driver needs to
> be robust under operating system delays. Using some of the application
> tricks I mentioned can help make the whole thing less sensitive to the
> hardware/driver issue, and since I’m primarily an application-level
> programmer this is where I put all my effort, and have several notable
> successes, and no failures yet.
>
> And yes, to get one of those successes I had to play with scheduler
> priorities and thread affinity. You can use these very carefully with
> success, but you don’t start out saying “Well, if I just tweak this thread
> priority and change this affinity, all will be well”.
> joe
>
>
>
>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>> Complete newbie here. I’m working with a modified reference design for
>>> a
>>> PCI-e
>>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>>> to main
>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>> most of
>>> the time i.e. the DMA operations are successfully clearing out the FIFO
>>> in good
>>> time.
>>>
>>> Unfortunately, every so often the DMA loop is being delayed and the
>>> FIFO
>>> is
>>> overflowing. I need to find out what’s causing this interference. I’ve
>>> disabled
>>> as much hardware and software as I can but no luck. Two questions:
>>>
>>
>> Russel, what platform are you using (Core <i_something>, Atom…).
>> With
>> all
>> Intel processors one thing is very important, make sure your transfers
>> are
>> a
>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>> recommend
>> setting the driver alignment requirements to 64 byte, or some multiple
>> thereof.
>> (64 byte is the cache line size). Observing this rule works wonders. On
>> an
>> Atom
>> application it gave me a performance increase of over 30%. It’s also in
>> the IA
>> manuals somewhere, unfortunately I only found the note myself after I
>> had
>> experimented for ages :wink:
>>
>> Also, when writing to memory, try to always transfer <max_payload_size>
>> blocks.
>> This depends on your chip-set but is typically 128 byte or 256 byte.
>> This
>> gives
>> you the best usage of available credits.
>>
>> If you are using Intel Atom, you may have to disable deep-sleep S6
>> states.
>> This
>> can usually be done in the BIOS. I had a major issue with the credit
>> update
>> (UpdateFC) frequency here which caused a problem very like the one you
>> described
>>
>> Slan,
>> Charles
>>
>>> 1. I installed LatencyMon.exe to see what might be causing the
>>> interference and
>>> I can see that on an idle system, ataport.sys is handling a lot of
>>> interrupts
>>> and dpc’s BUT there are no hard page faults and there are no
>>> applications
>>> running. What could be causing this activity?
>>>
>>> 2. I’ve noticed that I get much less interference when I minimise the
>>> command
>>> window displaying application debug output, even though there is no
>>> debug
>>> output during the DMA loop itself. Am I correct in assuming that
>>> Windows
>>> would
>>> not write to the graphics chip (x4500) when there is no change to the
>>> desktop
>>> screen?
>>>
>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>> chip
>>> (that might interfere with my application DMAs)?
>>>
>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which
>>> is
>>> using
>>> system memory) during a screen refresh?
>>>
>>> Thanks.
>>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer</max_payload_size></i_something>

“m” wrote in message news:xxxxx@ntdev…
> IMHO this used to be a great trick to page out leaked memory in buggy apps
> and prevent various crashes caused by other buggy code that didn’t check
> for allocation failure! Too bad the “feature” has been removed (please
> read this as sarcasm)

The sarcasm is onto authors of that buggy code… unless it belongs to MS?
Programs (such as torrent client) are minimized most of their life.
– pa

> “Doron Holan” wrote in message
> news:xxxxx@ntdev…
> Reducing working set on minimize is no longer done on w7, maybe on Vista
> as well
>
> d
>
> debt from my phone
>
>
> --------------------------------------------------------------------------------
> From: xxxxx@flounder.com
> Sent: 10/20/2011 8:18 AM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] DMA latency
>
>
> Note that minimizing windows tends to tell the scheduler that the process
> is not terribly important, and its pages are candidates for page-out.
> Thus, its working set requirement is reduced, leaving more pages available
> in physical memory. This could reduce the paging behavior of your app,
> see my previous posts.
>
> Until you have eliminated all possible causes > 2 orders of magnitude,
> there is no point in trying to eke out the last drop of performance
> possible. Paging is six to seven orders of magnitude performance hit, per
> page fault. For a large buffer, you might take multiple page faults
> during MmProbeAndLockPages (hence my reference to eight orders of
> magnitude).
>
> It would be useful to have performance data on this device such as:
>
> Input data rate
> FIFO buffer size (ideally expressed in units of time of the input data)
> Interrupt rate
> I/O buffer size on your read request
>
> Then there are the architecture questions:
>
> Do you do internal buffering in your driver? Or do you rely on the MDL in
> the IRP?
> How much do you do in the ISR vs. the DPC?
> What priority boost do you give on IRP completion?
> How many instructions does your app execute between I/O read calls?
> Note that if there is a kernel call (other than the read request or
> an asynchronous inter-thread queue request)
> in the loop, you probably need to rewrite the loop.
> Note that if the kernel call is graphics-related (including
> SendMessage to controls), you need to rewrite the loop.
> Putting a simple loop in a separate thread can often help.
>
> I have been solving these problems for about 15 years now. The typical
> causes are
> Underdesigned hardware
> Underdesigned driver
> Underdesigned app
>
> That’s it. You can often compensate for an underdesigned driver/hardware
> combination by expending more effort on the app. But hardware should be
> robust under operating system delays (large FIFO) and the driver needs to
> be robust under operating system delays. Using some of the application
> tricks I mentioned can help make the whole thing less sensitive to the
> hardware/driver issue, and since I’m primarily an application-level
> programmer this is where I put all my effort, and have several notable
> successes, and no failures yet.
>
> And yes, to get one of those successes I had to play with scheduler
> priorities and thread affinity. You can use these very carefully with
> success, but you don’t start out saying “Well, if I just tweak this thread
> priority and change this affinity, all will be well”.
> joe
>
>
>
>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>> Complete newbie here. I’m working with a modified reference design for a
>>> PCI-e
>>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>>> to main
>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>> most of
>>> the time i.e. the DMA operations are successfully clearing out the FIFO
>>> in good
>>> time.
>>>
>>> Unfortunately, every so often the DMA loop is being delayed and the FIFO
>>> is
>>> overflowing. I need to find out what’s causing this interference. I’ve
>>> disabled
>>> as much hardware and software as I can but no luck. Two questions:
>>>
>>
>> Russel, what platform are you using (Core <i_something>, Atom…). With
>> all
>> Intel processors one thing is very important, make sure your transfers
>> are
>> a
>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>> recommend
>> setting the driver alignment requirements to 64 byte, or some multiple
>> thereof.
>> (64 byte is the cache line size). Observing this rule works wonders. On
>> an
>> Atom
>> application it gave me a performance increase of over 30%. It’s also in
>> the IA
>> manuals somewhere, unfortunately I only found the note myself after I had
>> experimented for ages :wink:
>>
>> Also, when writing to memory, try to always transfer <max_payload_size>
>> blocks.
>> This depends on your chip-set but is typically 128 byte or 256 byte. This
>> gives
>> you the best usage of available credits.
>>
>> If you are using Intel Atom, you may have to disable deep-sleep S6
>> states.
>> This
>> can usually be done in the BIOS. I had a major issue with the credit
>> update
>> (UpdateFC) frequency here which caused a problem very like the one you
>> described
>>
>> Slan,
>> Charles
>>
>>> 1. I installed LatencyMon.exe to see what might be causing the
>>> interference and
>>> I can see that on an idle system, ataport.sys is handling a lot of
>>> interrupts
>>> and dpc’s BUT there are no hard page faults and there are no
>>> applications
>>> running. What could be causing this activity?
>>>
>>> 2. I’ve noticed that I get much less interference when I minimise the
>>> command
>>> window displaying application debug output, even though there is no
>>> debug
>>> output during the DMA loop itself. Am I correct in assuming that Windows
>>> would
>>> not write to the graphics chip (x4500) when there is no change to the
>>> desktop
>>> screen?
>>>
>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>> chip
>>> (that might interfere with my application DMAs)?
>>>
>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is
>>> using
>>> system memory) during a screen refresh?
>>>
>>> Thanks.
>>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
></max_payload_size></i_something>

Mr. Terhell: LatencyMon is a terrific tool, and a great service to the community. It’s a tool I long wanted to write, but could never find time. Bravo for making it freely available.

In terms of working-set trimming…
(I’m not sure why we’re even discussing this, but…)

The highly aggressive working set trimming policies that were implemented in older versions of Windows were controversial since the beginning of time. They might have made sense back when memory was expensive, and thus often highly constrained. Do you remember NT V3.5 which would run on systems with 12MB of memory? Back then, the idea was if you minimized an application you probably weren’t directly interacting with it, so the app’s working set could be trimmed back to its quota (not below) and the limited memory available could be made available to other “active” (foreground and non-iconified) apps.

I seem to recall Mr. Memory Manager re-doing many of the working set policies, including working set trimming for… gosh… I think it was S03 but it might have been Vista?

In general, you should know that the Windows Memory Manager has undergone periodic review over the years. This has resulted in meaningful revisions in every OS release since Win2K, and some of these have been very significant. The result is that Windows has a really modern, well designed, and solidly implemented Memory Manager. Many of the “old wisdom” about Windows Memory Management (like those to do with highly aggressive working set trimming) is no longer true.

Peter
OSR

More specifically a reference to a wonderful UM C++ MFC application I
inherited that could have been used for a class on worst practices but
unfortunately was ~100,000 lines of code and hard to replace even though it
leaked memory like a sieve and ran on systems where the drivers used must
succeed KM allocations. The ‘solution’ was to add RAM and a scheduled task
for daily server reboot, but in a pinch, minimizing the window (yes, there
was also a GUI even though it was a service) would free some physical RAM
and keep the server going for a few more hours. Needless to say, I did not
even attempt to repair the code, but rewrote it from scratch as quickly as
possible.

“Pavel A.” wrote in message news:xxxxx@ntdev…

“m” wrote in message news:xxxxx@ntdev…
> IMHO this used to be a great trick to page out leaked memory in buggy apps
> and prevent various crashes caused by other buggy code that didn’t check
> for allocation failure! Too bad the “feature” has been removed (please
> read this as sarcasm)

The sarcasm is onto authors of that buggy code… unless it belongs to MS?
Programs (such as torrent client) are minimized most of their life.
– pa

> “Doron Holan” wrote in message
> news:xxxxx@ntdev…
> Reducing working set on minimize is no longer done on w7, maybe on Vista
> as well
>
> d
>
> debt from my phone
>
>
> --------------------------------------------------------------------------------
> From: xxxxx@flounder.com
> Sent: 10/20/2011 8:18 AM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] DMA latency
>
>
> Note that minimizing windows tends to tell the scheduler that the process
> is not terribly important, and its pages are candidates for page-out.
> Thus, its working set requirement is reduced, leaving more pages available
> in physical memory. This could reduce the paging behavior of your app,
> see my previous posts.
>
> Until you have eliminated all possible causes > 2 orders of magnitude,
> there is no point in trying to eke out the last drop of performance
> possible. Paging is six to seven orders of magnitude performance hit, per
> page fault. For a large buffer, you might take multiple page faults
> during MmProbeAndLockPages (hence my reference to eight orders of
> magnitude).
>
> It would be useful to have performance data on this device such as:
>
> Input data rate
> FIFO buffer size (ideally expressed in units of time of the input data)
> Interrupt rate
> I/O buffer size on your read request
>
> Then there are the architecture questions:
>
> Do you do internal buffering in your driver? Or do you rely on the MDL in
> the IRP?
> How much do you do in the ISR vs. the DPC?
> What priority boost do you give on IRP completion?
> How many instructions does your app execute between I/O read calls?
> Note that if there is a kernel call (other than the read request or
> an asynchronous inter-thread queue request)
> in the loop, you probably need to rewrite the loop.
> Note that if the kernel call is graphics-related (including
> SendMessage to controls), you need to rewrite the loop.
> Putting a simple loop in a separate thread can often help.
>
> I have been solving these problems for about 15 years now. The typical
> causes are
> Underdesigned hardware
> Underdesigned driver
> Underdesigned app
>
> That’s it. You can often compensate for an underdesigned driver/hardware
> combination by expending more effort on the app. But hardware should be
> robust under operating system delays (large FIFO) and the driver needs to
> be robust under operating system delays. Using some of the application
> tricks I mentioned can help make the whole thing less sensitive to the
> hardware/driver issue, and since I’m primarily an application-level
> programmer this is where I put all my effort, and have several notable
> successes, and no failures yet.
>
> And yes, to get one of those successes I had to play with scheduler
> priorities and thread affinity. You can use these very carefully with
> success, but you don’t start out saying “Well, if I just tweak this thread
> priority and change this affinity, all will be well”.
> joe
>
>
>
>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>> Complete newbie here. I’m working with a modified reference design for a
>>> PCI-e
>>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>>> to main
>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>> most of
>>> the time i.e. the DMA operations are successfully clearing out the FIFO
>>> in good
>>> time.
>>>
>>> Unfortunately, every so often the DMA loop is being delayed and the FIFO
>>> is
>>> overflowing. I need to find out what’s causing this interference. I’ve
>>> disabled
>>> as much hardware and software as I can but no luck. Two questions:
>>>
>>
>> Russel, what platform are you using (Core <i_something>, Atom…). With
>> all
>> Intel processors one thing is very important, make sure your transfers
>> are
>> a
>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>> recommend
>> setting the driver alignment requirements to 64 byte, or some multiple
>> thereof.
>> (64 byte is the cache line size). Observing this rule works wonders. On
>> an
>> Atom
>> application it gave me a performance increase of over 30%. It’s also in
>> the IA
>> manuals somewhere, unfortunately I only found the note myself after I had
>> experimented for ages :wink:
>>
>> Also, when writing to memory, try to always transfer <max_payload_size>
>> blocks.
>> This depends on your chip-set but is typically 128 byte or 256 byte. This
>> gives
>> you the best usage of available credits.
>>
>> If you are using Intel Atom, you may have to disable deep-sleep S6
>> states.
>> This
>> can usually be done in the BIOS. I had a major issue with the credit
>> update
>> (UpdateFC) frequency here which caused a problem very like the one you
>> described
>>
>> Slan,
>> Charles
>>
>>> 1. I installed LatencyMon.exe to see what might be causing the
>>> interference and
>>> I can see that on an idle system, ataport.sys is handling a lot of
>>> interrupts
>>> and dpc’s BUT there are no hard page faults and there are no
>>> applications
>>> running. What could be causing this activity?
>>>
>>> 2. I’ve noticed that I get much less interference when I minimise the
>>> command
>>> window displaying application debug output, even though there is no
>>> debug
>>> output during the DMA loop itself. Am I correct in assuming that Windows
>>> would
>>> not write to the graphics chip (x4500) when there is no change to the
>>> desktop
>>> screen?
>>>
>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>> chip
>>> (that might interfere with my application DMAs)?
>>>
>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which is
>>> using
>>> system memory) during a screen refresh?
>>>
>>> Thanks.
>>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
></max_payload_size></i_something>

“m” wrote in message news:xxxxx@ntdev…
> More specifically a reference to a wonderful UM C++ MFC application I
> inherited that could have been used for a class on worst practices but
> unfortunately was ~100,000 lines of code and hard to replace even though
> it leaked memory like a sieve and ran on systems where the drivers used
> must succeed KM allocations. The ‘solution’ was to add RAM and a
> scheduled task for daily server reboot, but in a pinch, minimizing the
> window (yes, there was also a GUI even though it was a service) would free
> some physical RAM and keep the server going for a few more hours.
> Needless to say, I did not even attempt to repair the code, but rewrote it
> from scratch as quickly as possible.

Yesterday it worked.
Today it is not working.
Windows is like that.

[http://libertybasicuniversity.com/lbnews/nl107/haiku.htm]

> “Pavel A.” wrote in message news:xxxxx@ntdev…
>
> “m” wrote in message news:xxxxx@ntdev…
>> IMHO this used to be a great trick to page out leaked memory in buggy
>> apps and prevent various crashes caused by other buggy code that didn’t
>> check for allocation failure! Too bad the “feature” has been removed
>> (please read this as sarcasm)
>
> The sarcasm is onto authors of that buggy code… unless it belongs to MS?
> Programs (such as torrent client) are minimized most of their life.
> – pa
>
>
>> “Doron Holan” wrote in message
>> news:xxxxx@ntdev…
>> Reducing working set on minimize is no longer done on w7, maybe on Vista
>> as well
>>
>> d
>>
>> debt from my phone
>>
>>
>> --------------------------------------------------------------------------------
>> From: xxxxx@flounder.com
>> Sent: 10/20/2011 8:18 AM
>> To: Windows System Software Devs Interest List
>> Subject: Re:[ntdev] DMA latency
>>
>>
>> Note that minimizing windows tends to tell the scheduler that the process
>> is not terribly important, and its pages are candidates for page-out.
>> Thus, its working set requirement is reduced, leaving more pages
>> available
>> in physical memory. This could reduce the paging behavior of your app,
>> see my previous posts.
>>
>> Until you have eliminated all possible causes > 2 orders of magnitude,
>> there is no point in trying to eke out the last drop of performance
>> possible. Paging is six to seven orders of magnitude performance hit,
>> per
>> page fault. For a large buffer, you might take multiple page faults
>> during MmProbeAndLockPages (hence my reference to eight orders of
>> magnitude).
>>
>> It would be useful to have performance data on this device such as:
>>
>> Input data rate
>> FIFO buffer size (ideally expressed in units of time of the input data)
>> Interrupt rate
>> I/O buffer size on your read request
>>
>> Then there are the architecture questions:
>>
>> Do you do internal buffering in your driver? Or do you rely on the MDL
>> in
>> the IRP?
>> How much do you do in the ISR vs. the DPC?
>> What priority boost do you give on IRP completion?
>> How many instructions does your app execute between I/O read calls?
>> Note that if there is a kernel call (other than the read request or
>> an asynchronous inter-thread queue request)
>> in the loop, you probably need to rewrite the loop.
>> Note that if the kernel call is graphics-related (including
>> SendMessage to controls), you need to rewrite the loop.
>> Putting a simple loop in a separate thread can often help.
>>
>> I have been solving these problems for about 15 years now. The typical
>> causes are
>> Underdesigned hardware
>> Underdesigned driver
>> Underdesigned app
>>
>> That’s it. You can often compensate for an underdesigned driver/hardware
>> combination by expending more effort on the app. But hardware should be
>> robust under operating system delays (large FIFO) and the driver needs to
>> be robust under operating system delays. Using some of the application
>> tricks I mentioned can help make the whole thing less sensitive to the
>> hardware/driver issue, and since I’m primarily an application-level
>> programmer this is where I put all my effort, and have several notable
>> successes, and no failures yet.
>>
>> And yes, to get one of those successes I had to play with scheduler
>> priorities and thread affinity. You can use these very carefully with
>> success, but you don’t start out saying “Well, if I just tweak this
>> thread
>> priority and change this affinity, all will be well”.
>> joe
>>
>>
>>
>>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>>> Complete newbie here. I’m working with a modified reference design for
>>>> a
>>>> PCI-e
>>>> Xilinx eval board which uses DMA to transfer data from the board’s FIFO
>>>> to main
>>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>>> most of
>>>> the time i.e. the DMA operations are successfully clearing out the FIFO
>>>> in good
>>>> time.
>>>>
>>>> Unfortunately, every so often the DMA loop is being delayed and the
>>>> FIFO
>>>> is
>>>> overflowing. I need to find out what’s causing this interference. I’ve
>>>> disabled
>>>> as much hardware and software as I can but no luck. Two questions:
>>>>
>>>
>>> Russel, what platform are you using (Core <i_something>, Atom…).
>>> With
>>> all
>>> Intel processors one thing is very important, make sure your transfers
>>> are
>>> a
>>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>>> recommend
>>> setting the driver alignment requirements to 64 byte, or some multiple
>>> thereof.
>>> (64 byte is the cache line size). Observing this rule works wonders. On
>>> an
>>> Atom
>>> application it gave me a performance increase of over 30%. It’s also in
>>> the IA
>>> manuals somewhere, unfortunately I only found the note myself after I
>>> had
>>> experimented for ages :wink:
>>>
>>> Also, when writing to memory, try to always transfer <max_payload_size>
>>> blocks.
>>> This depends on your chip-set but is typically 128 byte or 256 byte.
>>> This
>>> gives
>>> you the best usage of available credits.
>>>
>>> If you are using Intel Atom, you may have to disable deep-sleep S6
>>> states.
>>> This
>>> can usually be done in the BIOS. I had a major issue with the credit
>>> update
>>> (UpdateFC) frequency here which caused a problem very like the one you
>>> described
>>>
>>> Slan,
>>> Charles
>>>
>>>> 1. I installed LatencyMon.exe to see what might be causing the
>>>> interference and
>>>> I can see that on an idle system, ataport.sys is handling a lot of
>>>> interrupts
>>>> and dpc’s BUT there are no hard page faults and there are no
>>>> applications
>>>> running. What could be causing this activity?
>>>>
>>>> 2. I’ve noticed that I get much less interference when I minimise the
>>>> command
>>>> window displaying application debug output, even though there is no
>>>> debug
>>>> output during the DMA loop itself. Am I correct in assuming that
>>>> Windows
>>>> would
>>>> not write to the graphics chip (x4500) when there is no change to the
>>>> desktop
>>>> screen?
>>>>
>>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>>> chip
>>>> (that might interfere with my application DMAs)?
>>>>
>>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which
>>>> is
>>>> using
>>>> system memory) during a screen refresh?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> For our schedule of WDF, WDM, debugging and other seminars visit:
>>> http://www.osr.com/seminars
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at
>>> http://www.osronline.com/page.cfm?name=ListServer
>>>
>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
></max_payload_size></i_something>

One of the frequent questions on the MFC newsgroup is “I’ve got this app
working; how do I make it a service?” Of course, the answer is “Write it
as a service!” but this concept seems foreign to most programmers. Every
app has a GUI. Right?

The inevitable disasters follow.

Far too many C++ programmers don’t know how to use C++; they just use the
C subset (not even the “smarter C” part allowing declarations everywhere;
every variable mentioned anywhere in the subroutine is declared at the top
of the subroutine).

One thing I discovered once I started being a C++ programmer 15 years ago:
my apps don’t leak memory. I no longer worry about it; it is so easy to
get right.
joe

More specifically a reference to a wonderful UM C++ MFC application I
inherited that could have been used for a class on worst practices but
unfortunately was ~100,000 lines of code and hard to replace even though
it
leaked memory like a sieve and ran on systems where the drivers used must
succeed KM allocations. The ‘solution’ was to add RAM and a scheduled
task
for daily server reboot, but in a pinch, minimizing the window (yes, there
was also a GUI even though it was a service) would free some physical RAM
and keep the server going for a few more hours. Needless to say, I did
not
even attempt to repair the code, but rewrote it from scratch as quickly as
possible.

“Pavel A.” wrote in message news:xxxxx@ntdev…

“m” wrote in message news:xxxxx@ntdev…
>> IMHO this used to be a great trick to page out leaked memory in buggy
>> apps
>> and prevent various crashes caused by other buggy code that didn’t check
>> for allocation failure! Too bad the “feature” has been removed (please
>> read this as sarcasm)
>
> The sarcasm is onto authors of that buggy code… unless it belongs to MS?
> Programs (such as torrent client) are minimized most of their life.
> – pa
>
>
>> “Doron Holan” wrote in message
>> news:xxxxx@ntdev…
>> Reducing working set on minimize is no longer done on w7, maybe on Vista
>> as well
>>
>> d
>>
>> debt from my phone
>>
>>
>> --------------------------------------------------------------------------------
>> From: xxxxx@flounder.com
>> Sent: 10/20/2011 8:18 AM
>> To: Windows System Software Devs Interest List
>> Subject: Re:[ntdev] DMA latency
>>
>>
>> Note that minimizing windows tends to tell the scheduler that the
>> process
>> is not terribly important, and its pages are candidates for page-out.
>> Thus, its working set requirement is reduced, leaving more pages
>> available
>> in physical memory. This could reduce the paging behavior of your app,
>> see my previous posts.
>>
>> Until you have eliminated all possible causes > 2 orders of magnitude,
>> there is no point in trying to eke out the last drop of performance
>> possible. Paging is six to seven orders of magnitude performance hit,
>> per
>> page fault. For a large buffer, you might take multiple page faults
>> during MmProbeAndLockPages (hence my reference to eight orders of
>> magnitude).
>>
>> It would be useful to have performance data on this device such as:
>>
>> Input data rate
>> FIFO buffer size (ideally expressed in units of time of the input data)
>> Interrupt rate
>> I/O buffer size on your read request
>>
>> Then there are the architecture questions:
>>
>> Do you do internal buffering in your driver? Or do you rely on the MDL
>> in
>> the IRP?
>> How much do you do in the ISR vs. the DPC?
>> What priority boost do you give on IRP completion?
>> How many instructions does your app execute between I/O read calls?
>> Note that if there is a kernel call (other than the read request or
>> an asynchronous inter-thread queue request)
>> in the loop, you probably need to rewrite the loop.
>> Note that if the kernel call is graphics-related (including
>> SendMessage to controls), you need to rewrite the loop.
>> Putting a simple loop in a separate thread can often help.
>>
>> I have been solving these problems for about 15 years now. The typical
>> causes are
>> Underdesigned hardware
>> Underdesigned driver
>> Underdesigned app
>>
>> That’s it. You can often compensate for an underdesigned
>> driver/hardware
>> combination by expending more effort on the app. But hardware should be
>> robust under operating system delays (large FIFO) and the driver needs
>> to
>> be robust under operating system delays. Using some of the application
>> tricks I mentioned can help make the whole thing less sensitive to the
>> hardware/driver issue, and since I’m primarily an application-level
>> programmer this is where I put all my effort, and have several notable
>> successes, and no failures yet.
>>
>> And yes, to get one of those successes I had to play with scheduler
>> priorities and thread affinity. You can use these very carefully with
>> success, but you don’t start out saying “Well, if I just tweak this
>> thread
>> priority and change this affinity, all will be well”.
>> joe
>>
>>
>>
>>> Am 19.10.2011 18:37, schrieb xxxxx@eircom.net:
>>>> Complete newbie here. I’m working with a modified reference design for
>>>> a
>>>> PCI-e
>>>> Xilinx eval board which uses DMA to transfer data from the board’s
>>>> FIFO
>>>> to main
>>>> memory. I can monitor the size of the FIFO and I see that it is empty
>>>> most of
>>>> the time i.e. the DMA operations are successfully clearing out the
>>>> FIFO
>>>> in good
>>>> time.
>>>>
>>>> Unfortunately, every so often the DMA loop is being delayed and the
>>>> FIFO
>>>> is
>>>> overflowing. I need to find out what’s causing this interference. I’ve
>>>> disabled
>>>> as much hardware and software as I can but no luck. Two questions:
>>>>
>>>
>>> Russel, what platform are you using (Core <i_something>, Atom…).
>>> With
>>> all
>>> Intel processors one thing is very important, make sure your transfers
>>> are
>>> a
>>> multiple of 64 Byte and are aligned to 64-byte boundaries. I would
>>> recommend
>>> setting the driver alignment requirements to 64 byte, or some multiple
>>> thereof.
>>> (64 byte is the cache line size). Observing this rule works wonders. On
>>> an
>>> Atom
>>> application it gave me a performance increase of over 30%. It’s also in
>>> the IA
>>> manuals somewhere, unfortunately I only found the note myself after I
>>> had
>>> experimented for ages :wink:
>>>
>>> Also, when writing to memory, try to always transfer <max_payload_size>
>>> blocks.
>>> This depends on your chip-set but is typically 128 byte or 256 byte.
>>> This
>>> gives
>>> you the best usage of available credits.
>>>
>>> If you are using Intel Atom, you may have to disable deep-sleep S6
>>> states.
>>> This
>>> can usually be done in the BIOS. I had a major issue with the credit
>>> update
>>> (UpdateFC) frequency here which caused a problem very like the one you
>>> described
>>>
>>> Slan,
>>> Charles
>>>
>>>> 1. I installed LatencyMon.exe to see what might be causing the
>>>> interference and
>>>> I can see that on an idle system, ataport.sys is handling a lot of
>>>> interrupts
>>>> and dpc’s BUT there are no hard page faults and there are no
>>>> applications
>>>> running. What could be causing this activity?
>>>>
>>>> 2. I’ve noticed that I get much less interference when I minimise the
>>>> command
>>>> window displaying application debug output, even though there is no
>>>> debug
>>>> output during the DMA loop itself. Am I correct in assuming that
>>>> Windows
>>>> would
>>>> not write to the graphics chip (x4500) when there is no change to the
>>>> desktop
>>>> screen?
>>>>
>>>> 3. Does windows use DMAs to communicate screen updates to the graphics
>>>> chip
>>>> (that might interfere with my application DMAs)?
>>>>
>>>> 4. Does the graphics chip use DMAs to access it’s frame buffer (which
>>>> is
>>>> using
>>>> system memory) during a screen refresh?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> For our schedule of WDF, WDM, debugging and other seminars visit:
>>> http://www.osr.com/seminars
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at
>>> http://www.osronline.com/page.cfm?name=ListServer
>>>
>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
></max_payload_size></i_something>

Thanks, if YOU say so … that gets me shy.

//Daniel

wrote in message news:xxxxx@ntdev…
> Mr. Terhell: LatencyMon is a terrific tool, and a great service to the
> community. It’s a tool I long wanted to write, but could never find time.

> Far too many C++ programmers don’t know how to use C++; they just use the

C subset

Or even worse: the start to abuse absolutely all features of the language to satisfy their Egos, thus making the code by far lesser readable and maintainable then even a dirty C code.

The first thing I would say to a beginner C++ guy: C++ is a two-paradigm language, with one paradigm being OOP (all objects are by reference, ensapulation/polymorphis and sometimes inheritance), and another being abstract datatypes/metaprogramming (all objects are by value, operator= and operator+ and such, STL and most of Boost).

Also, please learn the difference between static and dynamic polymorphism.

Without this, you will either write good C++ code but only limit yourself to a subset or will write bad C++ code.

And yes, GoF and Alexandrescu are good books to read.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
>
> And yes, GoF and Alexandrescu are good books to read.

There is the “Thinking in C++” book, but no “Thinking in C”, why? Because C
is just not a language for thinking. It is good only for coding. (and there
still are people that don’t know the difference between designing and coding
:slight_smile:
– pa

> There is the “Thinking in C++” book, but no “Thinking in C”, why? Because C

is just not a language for thinking. It is good only for coding.

Yes, because in C you will code alone, and have your task done.

In C++, you must both think and code to have the task done, which means that C++ is more complex and just plain lesser productive :slight_smile:

Why C++ is used is because some lesser experienced managers (who are basing on books and not real-world experience) and the novice programmers think it is a “silver bullet” to solve the SD tasks. And than they drown to the situation where >50% of SD’s effort is to comprehend the mess of mis-created classes, templates and such which they have created at previous stages of the project.

Hey, you cannot even read the dirty C++ code without a special tool called “class browser”. :slight_smile:

This complexity is for nothing actually, and gives you nothing.

still are people that don’t know the difference between designing and coding

Yes, and all successul teams are such :slight_smile:

I know of no successful teams who separate design and coding. In the very end, they produce crappier code, with longer timeframe and more bugs. The whole PM’s “triangle” of timeframe/features/quality is worse in these teams.

Have you read the message about the 100K lines of code MFC app which was thrown away and rewritten? this is what is separation of design and coding.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com