Error messages from MSFT 2.7 Initiator

Hi all,

I am getting following error messages from MSFT initiator whenever I
get data digest error.

qrequests from 1 at d:\iscsi2\src\drivers\storage\iscsiprt\umlib
\utils.c 2937
Scsi error got - 5 32 0
Scsi error got - 5 32 0
Data Digest Error. Session 84F90000, Connection 84F7B280, AR 8522DDA8

  1. What is these scsi error messages? MSFT initiator resets the
    connection, opens the connection again and it gets data digest error
    but this time it doesn’t display any SCSI error messages.

  2. Is there any way to display a packet when MSFT initiator hits the
    check sum error?

Thanks
Karthikselvan

Get a copy of Wireshark or NetMon to capture the network traffic,
they decode iSCSI pretty well including the validity of the digest checksums.

As for the SCSI error, to me it looks suspiciously like SCSI Sense
data and if it is a Key/ASC pair then it interprets to “Unsupported
Command” coming from the target.

Again. analyse a network trace it will show what command the target
is rejecting.

Digest errors and rejected commands sound like you have a bad iSCSI
target that needs fixing.

Mark.

At 03:49 12/06/2008, sachithanandam karthikselvan wrote:

Hi all,

I am getting following error messages from MSFT initiator whenever I
get data digest error.

qrequests from 1 at d:\iscsi2\src\drivers\storage\iscsiprt\umlib
\utils.c 2937
Scsi error got - 5 32 0
Scsi error got - 5 32 0
Data Digest Error. Session 84F90000, Connection 84F7B280, AR 8522DDA8

  1. What is these scsi error messages? MSFT initiator resets the
    connection, opens the connection again and it gets data digest error
    but this time it doesn’t display any SCSI error messages.

  2. Is there any way to display a packet when MSFT initiator hits the
    check sum error?

A further suggestion. Get the WLK and set it up to run the Hardware
RAID array device tests for iSCSI. The relevant tests for you right
now would be the iSCSI digest tests and the SCSI compliance test.

The logs from these tests will tell you about the compliance of the target.

Now I’ve had some coffee, reading your error sequence again, I would
suggest that there’s a compound of errors. The first errors reported
are the SCSI Sense data response that the target did not support the
command request sent to it. I believe that the initiator checks the
returned sense data before checking any digest checksums (which are
normally not turned on in production use).

To have got this far in the session one presumes that all previous
iSCSI digests will have been correct. So my suggestion is that the
target is going in to a corner case when it rejects a command and
this corner case gets the digests wrong.

No doubt the target is running at ErrorLevel=0. At this level a
digest error is unrecoverable so the initiator terminates the session.

In order to discover what commands the target doesn’t like, you might
wish to turn off digests since they are probably a symptom rather than a cause.

Digests really aren’t that useful for iSCSI over TCP, there was long
debate over them in the IETF sessions and the main reasons they were
kept in was to keep the T.10 people happy and for future development on SCTP.

Mark.

At 06:02 12/06/2008, Mark S. Edwards wrote:

Get a copy of Wireshark or NetMon to capture the network traffic,
they decode iSCSI pretty well including the validity of the digest checksums.

As for the SCSI error, to me it looks suspiciously like SCSI Sense
data and if it is a Key/ASC pair then it interprets to “Unsupported
Command” coming from the target.

Again. analyse a network trace it will show what command the target
is rejecting.

Digest errors and rejected commands sound like you have a bad iSCSI
target that needs fixing.

Mark.

At 03:49 12/06/2008, sachithanandam karthikselvan wrote:
>Hi all,
>
>I am getting following error messages from MSFT initiator whenever I
>get data digest error.
>
>qrequests from 1 at d:\iscsi2\src\drivers\storage\iscsiprt\umlib
>\utils.c 2937
>Scsi error got - 5 32 0
>Scsi error got - 5 32 0
>Data Digest Error. Session 84F90000, Connection 84F7B280, AR 8522DDA8
>
>1) What is these scsi error messages? MSFT initiator resets the
>connection, opens the connection again and it gets data digest error
>but this time it doesn’t display any SCSI error messages.
>
>2) Is there any way to display a packet when MSFT initiator hits the
>check sum error?

Hi,

Thanks for your reply. I put wireshark and traced the packet. I am able find
out that the check sum error occurs for SCSI Read(10) command. It tries to
read 9 blocks of data. Target is sending the correct data with good SCSI
response (I see the final bit and status bit set to 1 and status is good).
Target is doing it’s job correctly. I am thinking that the bug is in
between TCP layer and iSCSI layer. It is interesting to know that all other
my I/O size are able to go through only this IO size is not able to make it.

Thanks
Karthikselvan

On Thu, Jun 12, 2008 at 12:10 PM, Mark S. Edwards
wrote:

> A further suggestion. Get the WLK and set it up to run the Hardware RAID
> array device tests for iSCSI. The relevant tests for you right now would be
> the iSCSI digest tests and the SCSI compliance test.
>
> The logs from these tests will tell you about the compliance of the target.
>
> Now I’ve had some coffee, reading your error sequence again, I would
> suggest that there’s a compound of errors. The first errors reported are
> the SCSI Sense data response that the target did not support the command
> request sent to it. I believe that the initiator checks the returned sense
> data before checking any digest checksums (which are normally not turned on
> in production use).
>
> To have got this far in the session one presumes that all previous iSCSI
> digests will have been correct. So my suggestion is that the target is
> going in to a corner case when it rejects a command and this corner case
> gets the digests wrong.
>
> No doubt the target is running at ErrorLevel=0. At this level a digest
> error is unrecoverable so the initiator terminates the session.
>
> In order to discover what commands the target doesn’t like, you might wish
> to turn off digests since they are probably a symptom rather than a cause.
>
> Digests really aren’t that useful for iSCSI over TCP, there was long debate
> over them in the IETF sessions and the main reasons they were kept in was to
> keep the T.10 people happy and for future development on SCTP.
>
> Mark.
>
>
>
>
> At 06:02 12/06/2008, Mark S. Edwards wrote:
>
> Get a copy of Wireshark or NetMon to capture the network traffic, they
>> decode iSCSI pretty well including the validity of the digest checksums.
>>
>> As for the SCSI error, to me it looks suspiciously like SCSI Sense data
>> and if it is a Key/ASC pair then it interprets to “Unsupported Command”
>> coming from the target.
>>
>> Again. analyse a network trace it will show what command the target is
>> rejecting.
>>
>> Digest errors and rejected commands sound like you have a bad iSCSI target
>> that needs fixing.
>>
>> Mark.
>>
>>
>> At 03:49 12/06/2008, sachithanandam karthikselvan wrote:
>>
>>> Hi all,
>>>
>>> I am getting following error messages from MSFT initiator whenever I
>>> get data digest error.
>>>
>>> qrequests from 1 at d:\iscsi2\src\drivers\storage\iscsiprt\umlib
>>> \utils.c 2937
>>> Scsi error got - 5 32 0
>>> Scsi error got - 5 32 0
>>> Data Digest Error. Session 84F90000, Connection 84F7B280, AR 8522DDA8
>>>
>>> 1) What is these scsi error messages? MSFT initiator resets the
>>> connection, opens the connection again and it gets data digest error
>>> but this time it doesn’t display any SCSI error messages.
>>>
>>> 2) Is there any way to display a packet when MSFT initiator hits the
>>> check sum error?
>>>
>>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

I would doubt that the problem is where you think it is. Although
the TCP checksum is regarded as relatively weak, it would be very
rare that in reproducible circumstances that bad data gets past the
TCP checksum every time.

If you think that the data in the iSCSI packet is correct and the
digest is in error, then it is 99% sure that the target is in
error. One thing to note, if TCP checksum offload is enabled then
the Wireshark trace will say that the TCP checksum fails - this is a
red herring because the NIC would fail the packet rather than hand it
up for processing if there was a TCP checksum error.

The way that digests work is that after filling in the transmit
buffers the target calculates the checksums, header and or data
(which ones do you have enabled, and which ones fail ?). On receipt
of the data the initiator runs the data through the checksum
algorithm to see if it gets a result matching what the target put in
to the iSCSI packet. If the initiator calculation disagrees with the
value sent by the target, then you get a digest failure.

The fact that this happens on a specific sized request strongly
suggests that the target has a corner case problem calculating the
data digest on this sized request.

It’s funny that despite the core Microsoft initiator code handling
the iSCSI protocol being released more than 4 years ago and
successfully transporting billions of packets daily you prefer to
point the finger at the initiator rather than consider the target to
be at fault. Presumably, Wireshark also tells you that the CRC
digest is wrong, that should be another clue. Look at the target CRC
generating code, is there a data underrun or overrun ? Do you have
extra space in the data buffer covered by the CRC that has not been
set to zeroes ?

I’m not saying that it can’t be a problem in the initiator but that
the initiator has proved its robustness over the years such that you
should eliminate every other possibility before sitting back and
saying it’s a Microsoft initiator problem. Not only that, if it
genuinely is a problem with the Microsoft initiator then the only way
Microsoft will acknowledge it is with incontrovertible proof. Either
way, you have more work to do.

Mark.

At 10:13 12/06/2008, sachithanandam karthikselvan wrote:

Hi,

Thanks for your reply. I put wireshark and traced the packet. I am
able find out that the check sum error occurs for SCSI Read(10)
command. It tries to read 9 blocks of data. Target is sending the
correct data with good SCSI response (I see the final bit and status
bit set to 1 and status is good). Target is doing it’s job
correctly. I am thinking that the bug is in between TCP layer and
iSCSI layer. It is interesting to know that all other my I/O size
are able to go through only this IO size is not able to make it.

Thanks
Karthikselvan

On Thu, Jun 12, 2008 at 12:10 PM, Mark S. Edwards
<mailto:xxxxxxxxxx@muttsnuts.com> wrote:
>A further suggestion. Get the WLK and set it up to run the Hardware
>RAID array device tests for iSCSI. The relevant tests for you right
>now would be the iSCSI digest tests and the SCSI compliance test.
>
>The logs from these tests will tell you about the compliance of the target.
>
>Now I’ve had some coffee, reading your error sequence again, I would
>suggest that there’s a compound of errors. The first errors
>reported are the SCSI Sense data response that the target did not
>support the command request sent to it. I believe that the
>initiator checks the returned sense data before checking any digest
>checksums (which are normally not turned on in production use).
>
>To have got this far in the session one presumes that all previous
>iSCSI digests will have been correct. So my suggestion is that the
>target is going in to a corner case when it rejects a command and
>this corner case gets the digests wrong.
>
>No doubt the target is running at ErrorLevel=0. At this level a
>digest error is unrecoverable so the initiator terminates the session.
>
>In order to discover what commands the target doesn’t like, you
>might wish to turn off digests since they are probably a symptom
>rather than a cause.
>
>Digests really aren’t that useful for iSCSI over TCP, there was long
>debate over them in the IETF sessions and the main reasons they were
>kept in was to keep the T.10 people happy and for future development on SCTP.
>
>Mark.
>
>
>
>
>At 06:02 12/06/2008, Mark S. Edwards wrote:
>
>Get a copy of Wireshark or NetMon to capture the network traffic,
>they decode iSCSI pretty well including the validity of the digest checksums.
>
>As for the SCSI error, to me it looks suspiciously like SCSI Sense
>data and if it is a Key/ASC pair then it interprets to “Unsupported
>Command” coming from the target.
>
>Again. analyse a network trace it will show what command the target
>is rejecting.
>
>Digest errors and rejected commands sound like you have a bad iSCSI
>target that needs fixing.
>
>Mark.
>
>
>At 03:49 12/06/2008, sachithanandam karthikselvan wrote:
>Hi all,
>
>I am getting following error messages from MSFT initiator whenever I
>get data digest error.
>
>qrequests from 1 at d:\iscsi2\src\drivers\storage\iscsiprt\umlib
>\utils.c 2937
>Scsi error got - 5 32 0
>Scsi error got - 5 32 0
>Data Digest Error. Session 84F90000, Connection 84F7B280, AR 8522DDA8
>
>1) What is these scsi error messages? MSFT initiator resets the
>connection, opens the connection again and it gets data digest error
>but this time it doesn’t display any SCSI error messages.
>
>2) Is there any way to display a packet when MSFT initiator hits the
>check sum error?
>
>
>
>—
>NTDEV is sponsored by OSR
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http:http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http:http://www.osronline.com/page.cfm?name=ListServer
>
>
>— NTDEV is sponsored by OSR For our schedule of WDF, WDM,
>debugging and other seminars visit: http://www.osr.com/seminars To
>unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer</http:></http:></mailto:xxxxx>