system corruption from too many files

Hi,

I’m new to the forum, and I have a problem to share with you all. I hope that you can give me some ideas.

I am using windows XP pro, with a 3U server machine and 15 disk volumes. The OS is mirrored on 2 hard drives, and I have 14 volumes in which to write .txt files. My program will write millions of .txt files to the 14 disks in an evenly distributed manner, and really, there is no rest for the computer. After several hundreds of thousands of files have been written, my firewall application(zone alarm) crashes, and then I am unable to open any other programs due to corruption in the kernel memory.

My application uses windows HANDLES, and CreateFile() WriteFile() and CloseHandle() functions for file I/O. The I/O is unbuffered, and uses the write-through method. The problem occurs consistently using 3 different RAID controllers, so I am inclined to believe that the problem is in the OS File system.

When I perform a !filecache operation using the Kernel Debugger, I see that the $Logfile and $Mft files are huge (up to 100MB each) There are 14 $Logfile entries of considerable size, 14 $Mft entries of considerable size, and 14 more $Mft entries of small size.

Any help would be greatly appreciated. Thank you.

Jake Chung

Before you research this further try to reproduce the problem without Zone
Alarm installed. While seemingly innocuous, it digs itself fairly deep into
the system (and also attaches itself to the file system from time to time).

Even if it does repro without it installed it’s one less variable to take
into consideration.

-scott


Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I’m new to the forum, and I have a problem to share with you all. I hope
> that you can give me some ideas.
>
> I am using windows XP pro, with a 3U server machine and 15 disk volumes.
> The OS is mirrored on 2 hard drives, and I have 14 volumes in which to
> write .txt files. My program will write millions of .txt files to the 14
> disks in an evenly distributed manner, and really, there is no rest for
> the computer. After several hundreds of thousands of files have been
> written, my firewall application(zone alarm) crashes, and then I am unable
> to open any other programs due to corruption in the kernel memory.
>
> My application uses windows HANDLES, and CreateFile() WriteFile() and
> CloseHandle() functions for file I/O. The I/O is unbuffered, and uses the
> write-through method. The problem occurs consistently using 3 different
> RAID controllers, so I am inclined to believe that the problem is in the
> OS File system.
>
> When I perform a !filecache operation using the Kernel Debugger, I see
> that the $Logfile and $Mft files are huge (up to 100MB each) There are 14
> $Logfile entries of considerable size, 14 $Mft entries of considerable
> size, and 14 more $Mft entries of small size.
>
> Any help would be greatly appreciated. Thank you.
>
> Jake Chung
>

Is Zone Alarm the ONLY software you have that installs kernel level
modifications? Or do you have an anti-virus filter in this system as
well?

Zone Alarm by itself is something we know to be invasive and to make
changes that we have seen destabilize the system. Can you reproduce the
problem without Zone Alarm installed?

And I must admit “corruption in the kernel memory” doesn’t seem to fit
with anything you’ve described, so I’m wondering why you think the
kernel memory is corrupted.

None of the file sizes you described for those meta data structures seem
unreasonable given the description you’ve provided.

When you force chkdsk to run does it observe any file system level
corruption? I know it is easy to point the finger at NTFS in this case,
but in fact my suspicion is that Zone Alarm or some other third party
product installed on the system has never been tested in a scenario like
you described (while I’m actually sure that NTFS HAS been tested in
scenarios like you described.) Thus, NTFS is not (in my mind) the first
place to start looking for the problem.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Friday, July 14, 2006 9:01 AM
To: ntfsd redirect
Subject: [ntfsd] system corruption from too many files

Hi,

I’m new to the forum, and I have a problem to share with you all. I
hope that you can give me some ideas.

I am using windows XP pro, with a 3U server machine and 15 disk volumes.
The OS is mirrored on 2 hard drives, and I have 14 volumes in which to
write .txt files. My program will write millions of .txt files to the
14 disks in an evenly distributed manner, and really, there is no rest
for the computer. After several hundreds of thousands of files have
been written, my firewall application(zone alarm) crashes, and then I am
unable to open any other programs due to corruption in the kernel
memory.

My application uses windows HANDLES, and CreateFile() WriteFile() and
CloseHandle() functions for file I/O. The I/O is unbuffered, and uses
the write-through method. The problem occurs consistently using 3
different RAID controllers, so I am inclined to believe that the problem
is in the OS File system.

When I perform a !filecache operation using the Kernel Debugger, I see
that the $Logfile and $Mft files are huge (up to 100MB each) There are
14 $Logfile entries of considerable size, 14 $Mft entries of
considerable size, and 14 more $Mft entries of small size.

Any help would be greatly appreciated. Thank you.

Jake Chung


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

I have begun a test with ZoneAlarm uninstalled, to see if that resolves anything.

Also, I’d like to expand on the problem. I see that I did not mention that 2000 file handles are always opened, and that data is written out to all 2000 file handles in a while loop. Data is written out in 4KB chunks, and file sizes range from 5KB ot 2MB (90% of files are down toward 5KB). When it reaches this “corrupted state”, file operations are extremely slow(up to 10 seconds to open a new file). Sometimes(not always), a checkdisk is performed on the drives when the machine is rebooted.

I am thinking that Kernel memory is corrupted because of the fact that applications are unable to load up, and because the system is slowed down. I am not knowledgeable in Kernels and Drivers, so it is very likely that my guess is entirely wrong.

I appreciate the input that you guys have given me. Thank you much.
-Jake

After a couple hours, the test I was running with ZoneAlarm disabled failed. So I guess I can scratch that off the list of culprits.

Does anyone have any other suggestions? The RAID card manufacturer suggested that I upgrade the drivers and firmware of the hard drives. Would that help?

JAKE:

I can’t say that I know much about the OS’s filesystem architecture,
but I have looked extensively at these sorts of firewalls, and, based on
my experience, I would recommend uninstalling it and seeing what
happens. However, some of these products do not remove everything when
they uninstall, so I would look in Device Manager (making sure to enable
viewing hidden devices) to ensure that they do.

MM

>> xxxxx@osr.com 2006-07-14 10:13 >>>
Is Zone Alarm the ONLY software you have that installs kernel level
modifications? Or do you have an anti-virus filter in this system as
well?

Zone Alarm by itself is something we know to be invasive and to make
changes that we have seen destabilize the system. Can you reproduce
the
problem without Zone Alarm installed?

And I must admit “corruption in the kernel memory” doesn’t seem to fit
with anything you’ve described, so I’m wondering why you think the
kernel memory is corrupted.

None of the file sizes you described for those meta data structures
seem
unreasonable given the description you’ve provided.

When you force chkdsk to run does it observe any file system level
corruption? I know it is easy to point the finger at NTFS in this
case,
but in fact my suspicion is that Zone Alarm or some other third party
product installed on the system has never been tested in a scenario
like
you described (while I’m actually sure that NTFS HAS been tested in
scenarios like you described.) Thus, NTFS is not (in my mind) the
first
place to start looking for the problem.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Friday, July 14, 2006 9:01 AM
To: ntfsd redirect
Subject: [ntfsd] system corruption from too many files

Hi,

I’m new to the forum, and I have a problem to share with you all. I
hope that you can give me some ideas.

I am using windows XP pro, with a 3U server machine and 15 disk
volumes.
The OS is mirrored on 2 hard drives, and I have 14 volumes in which to
write .txt files. My program will write millions of .txt files to the
14 disks in an evenly distributed manner, and really, there is no rest
for the computer. After several hundreds of thousands of files have
been written, my firewall application(zone alarm) crashes, and then I
am
unable to open any other programs due to corruption in the kernel
memory.

My application uses windows HANDLES, and CreateFile() WriteFile() and
CloseHandle() functions for file I/O. The I/O is unbuffered, and uses
the write-through method. The problem occurs consistently using 3
different RAID controllers, so I am inclined to believe that the
problem
is in the OS File system.

When I perform a !filecache operation using the Kernel Debugger, I see
that the $Logfile and $Mft files are huge (up to 100MB each) There
are
14 $Logfile entries of considerable size, 14 $Mft entries of
considerable size, and 14 more $Mft entries of small size.

Any help would be greatly appreciated. Thank you.

Jake Chung


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

JAKE:

Are you perhaps just low on physical memory/non-paged pool and are just
thrashing? Very strange things have been known to happen when the
system is low on non-paged pool.

MM

>> xxxxx@gmail.com 2006-07-14 10:47 >>>
I have begun a test with ZoneAlarm uninstalled, to see if that resolves
anything.

Also, I’d like to expand on the problem. I see that I did not mention
that 2000 file handles are always opened, and that data is written out
to all 2000 file handles in a while loop. Data is written out in 4KB
chunks, and file sizes range from 5KB ot 2MB (90% of files are down
toward 5KB). When it reaches this “corrupted state”, file operations
are extremely slow(up to 10 seconds to open a new file). Sometimes(not
always), a checkdisk is performed on the drives when the machine is
rebooted.

I am thinking that Kernel memory is corrupted because of the fact that
applications are unable to load up, and because the system is slowed
down. I am not knowledgeable in Kernels and Drivers, so it is very
likely that my guess is entirely wrong.

I appreciate the input that you guys have given me. Thank you much.
-Jake


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@evitechnology.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Martin,

The Physical Memory remains static. It does not change at all, and I have
plenty of physical memory left (as far as I know).

How can I determine the total size of my non-paged pool? I see that the
non-paged pool size is growing, but i don’t know what the maximum is.

On 7/14/06, Martin O’Brien wrote:
>
> JAKE:
>
> Are you perhaps just low on physical memory/non-paged pool and are just
> thrashing? Very strange things have been known to happen when the
> system is low on non-paged pool.
>
> MM
>
>
> >>> xxxxx@gmail.com 2006-07-14 10:47 >>>
> I have begun a test with ZoneAlarm uninstalled, to see if that resolves
> anything.
>
> Also, I’d like to expand on the problem. I see that I did not mention
> that 2000 file handles are always opened, and that data is written out
> to all 2000 file handles in a while loop. Data is written out in 4KB
> chunks, and file sizes range from 5KB ot 2MB (90% of files are down
> toward 5KB). When it reaches this “corrupted state”, file operations
> are extremely slow(up to 10 seconds to open a new file). Sometimes(not
> always), a checkdisk is performed on the drives when the machine is
> rebooted.
>
> I am thinking that Kernel memory is corrupted because of the fact that
> applications are unable to load up, and because the system is slowed
> down. I am not knowledgeable in Kernels and Drivers, so it is very
> likely that my guess is entirely wrong.
>
> I appreciate the input that you guys have given me. Thank you much.
> -Jake
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@evitechnology.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@gmail.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

JAKE:

I believe that for 32-Bit systems, the default non-paged size is 256MB
on XP. Given thatf physical memory is OK, considering what you are
experiencing, I would guess that non-paged is not the problem. Changing
the non-paged pool size is definitely not something that one wants to do
with off hand knowledge, so, should you decide to go this route, I would
ask that question on this list, and get a better answer than I can give
you (probably from Don Burn, Maxim, or one the Microsoft boys).

MM

>> xxxxx@gmail.com 2006-07-14 14:38 >>>
Martin,

The Physical Memory remains static. It does not change at all, and I
have
plenty of physical memory left (as far as I know).

How can I determine the total size of my non-paged pool? I see that
the
non-paged pool size is growing, but i don’t know what the maximum is.

On 7/14/06, Martin O’Brien wrote:
>
> JAKE:
>
> Are you perhaps just low on physical memory/non-paged pool and are
just
> thrashing? Very strange things have been known to happen when the
> system is low on non-paged pool.
>
> MM
>
>
> >>> xxxxx@gmail.com 2006-07-14 10:47 >>>
> I have begun a test with ZoneAlarm uninstalled, to see if that
resolves
> anything.
>
> Also, I’d like to expand on the problem. I see that I did not
mention
> that 2000 file handles are always opened, and that data is written
out
> to all 2000 file handles in a while loop. Data is written out in
4KB
> chunks, and file sizes range from 5KB ot 2MB (90% of files are down
> toward 5KB). When it reaches this “corrupted state”, file
operations
> are extremely slow(up to 10 seconds to open a new file).
Sometimes(not
> always), a checkdisk is performed on the drives when the machine is
> rebooted.
>
> I am thinking that Kernel memory is corrupted because of the fact
that
> applications are unable to load up, and because the system is slowed
> down. I am not knowledgeable in Kernels and Drivers, so it is very
> likely that my guess is entirely wrong.
>
> I appreciate the input that you guys have given me. Thank you much.
> -Jake
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@evitechnology.com
> To unsubscribe send a blank email to
xxxxx@lists.osr.com
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@gmail.com
> To unsubscribe send a blank email to
xxxxx@lists.osr.com
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@evitechnology.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

>hundreds of thousands of files have been written, my firewall application(zone

alarm) crashes, and then I am unable to open any other programs due to
corruption in the kernel memory.

What is the problem? Say goodbye to ZoneAlarm and use Windows Firewall instead.
Will it also crash?

Such a machine is definitely a special purpose server, which surely can
tolerate the firewall in a separate box BTW.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> How can I determine the total size of my non-paged pool? I see that the

Perfmon has a counter.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

I have learned some more about this problem.

The problem will only occur if I CreateFile() and CloseHandle() a several
hundred thousand times. If I do not create new files, and just write all my
data to 2000 files, I do not have the problem.

Is there anything that could go wrong in CreateFile() and CloseHandle()?
Are there any resources to find out exactly what goes on during a
CreateFile() call and CloseHandle() call?

On 7/14/06, Maxim S. Shatskih wrote:
>
> > How can I determine the total size of my non-paged pool? I see that the
>
> Perfmon has a counter.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@gmail.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Try !poolused to find the leaks. Set the “enable pool tagging” GlobalFlag
by gflags.exe or manually in the registry (bit 0x400)

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Jake Chung”
To: “Windows File Systems Devs Interest List”
Sent: Wednesday, July 19, 2006 4:24 PM
Subject: Re: [ntfsd] system corruption from too many files

> I have learned some more about this problem.
>
> The problem will only occur if I CreateFile() and CloseHandle() a several
> hundred thousand times. If I do not create new files, and just write all my
> data to 2000 files, I do not have the problem.
>
> Is there anything that could go wrong in CreateFile() and CloseHandle()?
> Are there any resources to find out exactly what goes on during a
> CreateFile() call and CloseHandle() call?
>
>
> On 7/14/06, Maxim S. Shatskih wrote:
> >
> > > How can I determine the total size of my non-paged pool? I see that the
> >
> > Perfmon has a counter.
> >
> > Maxim Shatskih, Windows DDK MVP
> > StorageCraft Corporation
> > xxxxx@storagecraft.com
> > http://www.storagecraft.com
> >
> >
> > —
> > Questions? First check the IFS FAQ at
> > https://www.osronline.com/article.cfm?id=17
> >
> > You are currently subscribed to ntfsd as: xxxxx@gmail.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

I just did a 15 minute test using my program, and using the Kernel
Debugger’s “!poolused” command, and found that there are several things that
are growing in memory. Here is a list of the *differences* in the NonPaged
pool in a 15 minute span:

Nonpaged Pool
TAG Allocs Used
Ddk 1 159,744
Fsfm 244 9,760
File 261 39,752
Mdl 3,961 507,008
Mmca 10 1,046

As you can see, the Memory Descriptor List grew the most in 15 minutes. It
began at 354,544, and it grew up to 861,552. What am I to make of this?

On 7/19/06, Maxim S. Shatskih wrote:
>
> Try !poolused to find the leaks. Set the “enable pool tagging”
> GlobalFlag
> by gflags.exe or manually in the registry (bit 0x400)
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> ----- Original Message -----
> From: “Jake Chung”
> To: “Windows File Systems Devs Interest List”
> Sent: Wednesday, July 19, 2006 4:24 PM
> Subject: Re: [ntfsd] system corruption from too many files
>
>
> > I have learned some more about this problem.
> >
> > The problem will only occur if I CreateFile() and CloseHandle() a
> several
> > hundred thousand times. If I do not create new files, and just write
> all my
> > data to 2000 files, I do not have the problem.
> >
> > Is there anything that could go wrong in CreateFile() and CloseHandle()?
> > Are there any resources to find out exactly what goes on during a
> > CreateFile() call and CloseHandle() call?
> >
> >
> > On 7/14/06, Maxim S. Shatskih wrote:
> > >
> > > > How can I determine the total size of my non-paged pool? I see that
> the
> > >
> > > Perfmon has a counter.
> > >
> > > Maxim Shatskih, Windows DDK MVP
> > > StorageCraft Corporation
> > > xxxxx@storagecraft.com
> > > http://www.storagecraft.com
> > >
> > >
> > > —
> > > Questions? First check the IFS FAQ at
> > > https://www.osronline.com/article.cfm?id=17
> > >
> > > You are currently subscribed to ntfsd as: xxxxx@gmail.com
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > >
> >
> > —
> > Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
> >
> > You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@gmail.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>