Non Paged Pool and the IO Manager

Hi,

We have several systems that crash after several days of operation (usually
7-20 days). When they crash, it is always because the non-paged pool is so
fragmented and so allocated that nobody can get any non-paged pool (sounds
like your basic memory leak).

However, when I look at the non-paged pool usage, it turns out that there
are literally millions of small segments allocated, all with the tag of 'Io
'. In one of my crashes, there were 2442801 allocated non-paged fragments
(that’s right, over 2 million) totalling 156 Mbytes all with the tag 'Io '.

I am sort of at a loss as to how to proceed from here. According to the
pooltag.txt file, these allocations are made by the IO Manager and are
described as “general IO allocations”.

I guess I am wondering what could cause millions of these never to be freed?
I am pretty sure it is being caused by a faulty driver, but I am not even
sure how to track down the culprit. Could it be caused by a bad user mode
process?

The bad allocations are all 0x40 bytes long, so they are pretty small. I
have looked at the contents of some of them, and couldn’t find any real
recognizable patterns there.

I would appreciate any suggestions.

Thanks,

Don

What about testing any new drivers with DriverVerifier? Maybe you can find
something.

Regards,
Ray Yang
xxxxx@ybwork.com
----- Original Message -----
From: “Don”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Thursday, July 08, 2004 9:38 PM
Subject: [ntdev] Non Paged Pool and the IO Manager

> Hi,
>
> We have several systems that crash after several days of operation
(usually
> 7-20 days). When they crash, it is always because the non-paged pool is so
> fragmented and so allocated that nobody can get any non-paged pool (sounds
> like your basic memory leak).
>
> However, when I look at the non-paged pool usage, it turns out that there
> are literally millions of small segments allocated, all with the tag of
'Io
> '. In one of my crashes, there were 2442801 allocated non-paged fragments
> (that’s right, over 2 million) totalling 156 Mbytes all with the tag ‘Io
’.
>
> I am sort of at a loss as to how to proceed from here. According to the
> pooltag.txt file, these allocations are made by the IO Manager and are
> described as “general IO allocations”.
>
> I guess I am wondering what could cause millions of these never to be
freed?
> I am pretty sure it is being caused by a faulty driver, but I am not even
> sure how to track down the culprit. Could it be caused by a bad user mode
> process?
>
> The bad allocations are all 0x40 bytes long, so they are pretty small. I
> have looked at the contents of some of them, and couldn’t find any real
> recognizable patterns there.
>
> I would appreciate any suggestions.
>
> Thanks,
>
> Don
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@ybwork.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Don wrote:

However, when I look at the non-paged pool usage, it turns out that there
are literally millions of small segments allocated, all with the tag of 'Io
'. In one of my crashes, there were 2442801 allocated non-paged fragments
(that’s right, over 2 million) totalling 156 Mbytes all with the tag 'Io '.

These tags are used for many small structures, but most particularly
this is the tag that’s used by the I/O manager for the buffers used for
buffered I/O.

The bad allocations are all 0x40 bytes long, so they are pretty small. I
have looked at the contents of some of them, and couldn’t find any real
recognizable patterns there.

Wow… they’re all 64 bytes. THAT’s interesting. I wonder what it means?

Look at the IRP list. What IRPs are outstanding? Any of these IRPs
account for any of these pool blocks??

Peter
OSR

Peter,

Thanks for the suggestions. I have been examining these data structures for
hours now, and I have found that there is actually a pattern to them. It
looks like that actual data structure is 36 bytes long. The rest of the
space seems to be filled with random data.

Our system has an IFS installed that makes a lot of TDI calls. It turns out
that the 5th DWORD of this structure always points to the DeviceObject of
the IFS. The second DWORD of this structure always points to one of several
KEVENTS that we use to have TDI signal back to us when it is done sending or
receiving a UDP packet. So it looks like this structure is being allocated
somewhere down in the networking stack.

We have about 400 systems running with identical hardware and software
configurations and we are only seeing this problem on 4 of them. I am
beginning to suspect network hardware problems that are causing an exception
in the networking driver somewhere and causing these objects to be leaked

Anyway, I am still hunting. I think I will follow Rays suggestion and see if
I can find anything with the driver verifier. I tried it on our IFS right
away and it didn’t find anything. I think it will try it on some of the
networking drivers and see if I find anything.

Still open to any ideas

Thanks,

Don

“PeterGV” wrote in message news:xxxxx@ntdev…
> Don wrote:
>
> > However, when I look at the non-paged pool usage, it turns out that
there
> > are literally millions of small segments allocated, all with the tag of
'Io
> > '. In one of my crashes, there were 2442801 allocated non-paged
fragments
> > (that’s right, over 2 million) totalling 156 Mbytes all with the tag ‘Io
’.
> >
>
> These tags are used for many small structures, but most particularly
> this is the tag that’s used by the I/O manager for the buffers used for
> buffered I/O.
>
> >
> > The bad allocations are all 0x40 bytes long, so they are pretty small. I
> > have looked at the contents of some of them, and couldn’t find any real
> > recognizable patterns there.
> >
>
> Wow… they’re all 64 bytes. THAT’s interesting. I wonder what it means?
>
> Look at the IRP list. What IRPs are outstanding? Any of these IRPs
> account for any of these pool blocks??
>
> Peter
> OSR
>

“Don” wrote in message news:xxxxx@ntdev…
>
> Anyway, I am still hunting. I think I will follow Rays suggestion and see
if
> I can find anything with the driver verifier. I tried it on our IFS right
> away and it didn’t find anything. I think it will try it on some of the
> networking drivers and see if I find anything.
>

Oh, heck, yeah. Enable Veriier on every driver in the system. Seriously,
go for it. And install the checked kernel and HAL if you can.

At the very least, Verifier will allow you to track the pool allocations,
right? So you can (usually) see who OWNS those hunks-o-pool.

I’ll be unhappy to find out that somebody down the networking stack is
allocating pool with the tag "Io "… hmmmm…

Peter
OSR