4K buffer memory alignment in ataport

Hello Everybody,
I am very new at windows driver development and I was wondering whether an ata miniport driver can request a specific alignment on each element of the IDE_SCATTER_GATHER_LIST.

The DMA engine of my controller is unable to read from buffers in host memory that are across a 4K boundary, so I need each memory area of the IDE_SCATTER_GATHER_ELEMENT to be within a 4K page. In other words, for each IDE_SCATTER_GATHER_ELEMENT (Address+Length) must be not cross a 4K boundary.

Is there a way I can ask the port driver (?) to do so?

Thanks in advance,
=A

> The DMA engine of my controller is unable to read from

buffers in host memory that are across a 4K boundary, so I
need each memory area of the IDE_SCATTER_GATHER_ELEMENT to be
within a 4K page. In other words, for each
IDE_SCATTER_GATHER_ELEMENT (Address+Length) must be not cross
a 4K boundary.

Is there a way I can ask the port driver (?) to do so?

Nope, your SOL. Better talk to you’re hardware designers and tell them the
bad news as soon as possible.

Jan

> Hello Everybody,

I am very new at windows driver development and I was wondering
whether
an ata miniport driver can request a specific alignment on each
element of
the IDE_SCATTER_GATHER_LIST.

The DMA engine of my controller is unable to read from buffers in host
memory that are across a 4K boundary, so I need each memory area of
the
IDE_SCATTER_GATHER_ELEMENT to be within a 4K page. In other words, for
each IDE_SCATTER_GATHER_ELEMENT (Address+Length) must be not cross a
4K
boundary.

Is there a way I can ask the port driver (?) to do so?

My scsiport drivers require alignment to a 512 byte boundary, and I have
found no way to tell scsiport to enforce that requirement, so I imagine
that ataport will be roughly the same.

Just so I understand, you can support multiple SG entries, as long as
each entry itself doesn’t cross a page? Can you manually break up the
entries yourself and feed them to your hardware that way? You may have
to limit ataport to half the actual number of sg elements that you
support.

What I have found with my scsiport driver is that while I do get some
non-512 byte aligned buffers, apart from a small number of requests at
boot and sometimes during certain operations (format, chkdsk I think),
the buffers are otherwise always aligned to PAGE_SIZE boundaries. So
what I do is declare an srb extension size of PAGE_SIZE * 2 - 1, which
guarantees that there will be at least 1 complete PAGE_SIZE aligned
block in the srb extension, and then use that as a bounce buffer.

Unfortunately you need to break some scsiport rules to do that - you can
tell scsiport to give you the physical address of the buffer, or the
virtual address, but never both. Storport relaxes the restrictions a bit
but is so broken in so many other ways that I went back to scsiport. I’m
not sure how closely related ataport and scsiport are - maybe you don’t
have similar restrictions?

Another option could be to put an upper filter in place that manually
re-allocates non-conforming buffers (again, assuming that that is a rare
event). I’m pretty sure that that would fail under a crash dump scenario
though, but maybe that doesn’t matter for your drivers?

Can you do some testing and establish whether almost all requests fit
your requirements and that it’s only a very small number that don’t? If
you are only handling a corner case then it should be acceptable if
there is a bit of a performance hit in doing so.

My buffer alignment dependency is because my ‘hardware’ is actually a
virtual adapter which talks to a Linux backend under Xen, and Linux
demands the 512 byte alignment.

James

Hello,
and thanks everybody for the answers and the suggestions, I think I solved the problem. The 4K boundary is a PCie 1.0 requirement (burst transfers cannot cross a 4K boundary) and I wanted to make sure that I had to handle that in the miniport.

Thanks again to everybody!
=A

> What I have found with my scsiport driver is that while I do get some

non-512 byte aligned buffers, apart from a small number of
requests at boot and sometimes during certain operations
(format, chkdsk I think), the buffers are otherwise always
aligned to PAGE_SIZE boundaries. So what I do is declare an
srb extension size of PAGE_SIZE * 2 - 1, which guarantees
that there will be at least 1 complete PAGE_SIZE aligned
block in the srb extension, and then use that as a bounce buffer.

The largest alignment you can specify for storport will be 8 bytes.

If you read the docs for the ReadFile user mode API, it basically says
unbuffered disk transfers (which often are large) should be aligned on
sector size boundries (no page sized alignment required). I’m sure there
also are many apps that don’t even do this, because most disk controller
hardware works fine with much smaller alignment.

If you want to get you driver WHQL certified, I believe there also are
requirements for maximum required alignment.

Jan

This is really not Windows driver related but it caught my eyes somehow.

I think you referred to the “TLP with data payload rules” section of the 1.0 spec.

A properly designed hardware should not require host software to do anything to accommodate this. NIC chip is required to be able to DMA from/to any address boundary. I don’t think any of the production PCIe 1.0 NIC driver has to anything special for this (at least for those from us). Yes, host can indeed initiate a crossing page boundary request for either MRd or MWr, but it is up to your DMA engine to form the TLPs.

For instance, the host driver may request a MRd from 0x1234ff00 with 0x200 data to complete,?the Read DMA engine should be smart enough to form 2 TLPs, one from 0x1234ff00 to 0x1234ffff, the other from 0x12350000 to 123500ff.

I suggest you to intentionally initiate a DMA request that is crossing page boundary, hookup? a PCIe bus analyzer?to see how your hardware handles it. If it’s not splitting the request as mentioned above, it’s seriously broken. Remember that pcie receiver can optionally validate such misformed DLP, and may throw a fatal error to NMI the system.

I personally will refuse to workaround hardware problems like this. Chip guys usually don’t fix their bugs if there is reasonable s/w workaround. This one is certainly not.


Calvin Guan
Broadcom Corp.
Connecting Everything(r)
Sent from my comfy lodge in Northern California.

----- Original Message ----
From: “xxxxx@bugfeeders.com
To: Windows System Software Devs Interest List
Sent: Tuesday, December 16, 2008 2:40:30 PM
Subject: RE:[ntdev] 4K buffer memory alignment in ataport

Hello,
? and thanks everybody for the answers and the suggestions, I think I solved the problem. The 4K boundary is a PCie 1.0 requirement (burst transfers cannot cross a 4K boundary) and I wanted to make sure that I had to handle that in the miniport.

Thanks again to everybody!
=A


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

__________________________________________________________________
Instant Messaging, free SMS, sharing photos and more… Try the new Yahoo! Canada Messenger at http://ca.beta.messenger.yahoo.com/

Pretty sure that there was previous authoritative mention here that for NTFS, the requirement is filesystem cluster alignment and not disk sector alignment

? S

-----Original Message-----
From: Jan Bottorff
Sent: Tuesday, December 16, 2008 18:19
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] 4K buffer memory alignment in ataport

> What I have found with my scsiport driver is that while I do get some
> non-512 byte aligned buffers, apart from a small number of
> requests at boot and sometimes during certain operations
> (format, chkdsk I think), the buffers are otherwise always
> aligned to PAGE_SIZE boundaries. So what I do is declare an
> srb extension size of PAGE_SIZE * 2 - 1, which guarantees
> that there will be at least 1 complete PAGE_SIZE aligned
> block in the srb extension, and then use that as a bounce buffer.

The largest alignment you can specify for storport will be 8 bytes.

If you read the docs for the ReadFile user mode API, it basically says
unbuffered disk transfers (which often are large) should be aligned on
sector size boundries (no page sized alignment required). I’m sure there
also are many apps that don’t even do this, because most disk controller
hardware works fine with much smaller alignment.

If you want to get you driver WHQL certified, I believe there also are
requirements for maximum required alignment.

Jan


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi,

For unbuffered file I/O I definitly don’t believe there is any buffer
alignment requirement related to NTFS cluster size. The biggest requirement
is transfer sizes must be a multiple of the disk sector size, the SDK docs
on ReadFile/WriteFile say the buffer is supposed to be aligned on a sector
sized boundry, but this may not be enforced.

You also can specify a NTFS cluster size of 512 bytes when formatting a
volume, so even if there were a cluster size alignment requirement there is
no assurance of page size alignment.

The most stringent alignment requirement is the storport AlignmentMask field
of the PORT_CONFIGURATION_INFORMATION structure, which says
(http://msdn.microsoft.com/en-us/library/ms810325.aspx) 8 byte alignment is
the maximum supported value. This would mean storage controllers that
require larger alignment will be incompatable, or will have to degrade
performance and copy the data through bounce buffers.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Skywing
Sent: Wednesday, December 17, 2008 8:26 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] 4K buffer memory alignment in ataport

Pretty sure that there was previous authoritative mention
here that for NTFS, the requirement is filesystem cluster
alignment and not disk sector alignment

  • S

-----Original Message-----
From: Jan Bottorff
> Sent: Tuesday, December 16, 2008 18:19
> To: Windows System Software Devs Interest List
> Subject: RE: [ntdev] 4K buffer memory alignment in ataport
>
>
> > What I have found with my scsiport driver is that while I
> do get some
> > non-512 byte aligned buffers, apart from a small number of
> requests at
> > boot and sometimes during certain operations (format,
> chkdsk I think),
> > the buffers are otherwise always aligned to PAGE_SIZE
> boundaries. So
> > what I do is declare an srb extension size of PAGE_SIZE * 2
> - 1, which
> > guarantees that there will be at least 1 complete PAGE_SIZE aligned
> > block in the srb extension, and then use that as a bounce buffer.
>
> The largest alignment you can specify for storport will be 8 bytes.
>
> If you read the docs for the ReadFile user mode API, it
> basically says unbuffered disk transfers (which often are
> large) should be aligned on sector size boundries (no page
> sized alignment required). I’m sure there also are many apps
> that don’t even do this, because most disk controller
> hardware works fine with much smaller alignment.
>
> If you want to get you driver WHQL certified, I believe there
> also are requirements for maximum required alignment.
>
> Jan
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online
> at http://www.osronline.com/page.cfm?name=ListServer
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online
> at http://www.osronline.com/page.cfm?name=ListServer
>

> The most stringent alignment requirement is the storport AlignmentMask

field
of the PORT_CONFIGURATION_INFORMATION structure, which says
(http://msdn.microsoft.com/en-us/library/ms810325.aspx) 8 byte
alignment
is
the maximum supported value. This would mean storage controllers that
require larger alignment will be incompatable, or will have to degrade
performance and copy the data through bounce buffers.

Yes, that’s what I do, but I break scsiport rules to do so - the choice
in scsiport is mapped buffers (can’t get physical address) or physical
address (can’t get virtual address). To work around this I choose mapped
buffers and then use MmGetPhysicalAddress to get the physical address. I
can get away with this because the xen backend is reading the buffers
based on the physical address, so no pci address translation is
required. It wouldn’t pass whql though.

James

>> IDE_SCATTER_GATHER_ELEMENT (Address+Length) must be not cross

> a 4K boundary.
>
> Is there a way I can ask the port driver (?) to do so?

Nope, your SOL. Better talk to you’re hardware designers and tell them the
bad news as soon as possible.

Why?

Take any IDE_SCATTER_GATHER_ELEMENT, and split it to several ones on 4K boundaries. This will work.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>Pretty sure that there was previous authoritative mention here that for NTFS, the requirement is

filesystem cluster alignment and not disk sector alignment

No. Disk sector alignment for both length, start offset and data pointer, independet of the FSD.

Note that for SCSI passthrough in CD/DVD burning code the disk sector size is 2048, not 512, and yes, many ATA controllers really do enforce this for CD/DVD.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>> Nope, your SOL. Better talk to you’re hardware designers and tell them the

> bad news as soon as possible.

Why?
Take any IDE_SCATTER_GATHER_ELEMENT, and split it to several ones on 4K boundaries. This will work.

Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Why? Because it’s awfully ugly. There are problems associate with this method too. This should be done in chip RTL-- form the right TLPs, that is it, even my lousy digital design skill can do it. If the chip guys are so incompetent to get it right,?it’s time to get a new job.

On this other hand, I strongly suspect that the OP’s hardware has already done. Just can’t believe if it doesn’t – it’s a unforgivable mistake for hw designer.


Calvin Guan
Broadcom Corp.
Connecting Everything(r)


Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at
http://ca.toolbar.yahoo.com.

>Why? Because it’s awfully ugly.

Why ugly? I think IDE_SCATTER_GATHER_ELEMENTs are converted to hardware-specific format in the miniport code anyway. Am I wrong?


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Hw designers don’t commit unforgivable mistakes.
It’s always the fault of driver folks to patch their buggy stuff, and do
it quickly :frowning:

Regards,
–PA

Well Pavel, I don’t know.

I worked and have been? working?for some of the best chip companies. I have a lot of respect to the chip designers(haven’t seen a bad one so far). They are human beings and they do make mistakes from time to time. A chip bug could be very difficult to expose and certainly very expensive to fix. Even a cheap shuttle tape-out cost half a million. Adding the lead time and reverification time, it’s extremely expensive. You missed a product/chipset cycle and you’re pretty much done. OEMs will be mad and look down upon you. Moreover, fixing bug?A may introduce nastier bug?B and C. A company can easily go belly up for this. As a driver guy, I do everything to help to save money for my company.

The reasons I said unforgivable is that such bug indicate:
0) The bug is dumb.

  1. The?person who designed the dma block is clueless about the PCIe spec.
  2. The bug is not?caught in simulation and FPGA stage.
  3. PCIe compliance test is not done correctly.

with the assumption that the OP’s hardware indeed have such bug which I believe it’s very unlikely. I have suggested the OP to confirm it in an earlier thread.

It might be forgivable for board level design companies who?put chips from others together.?However, it’s very dangerous?for a chip company. Nowadays, very few chip companies design their own PCIe protocol logic, PCIe SEDES and PCIe PHYs like we do. It take years?of work from highly experienced staffs to get it right. It’s much cheaper to buy the IP from someone.


Calvin Guan
Broadcom Corp.
Connecting Everything(r)

----- Original Message ----
From: Pavel A.
To: Windows System Software Devs Interest List
Sent: Friday, December 19, 2008 2:59:56 AM
Subject: Re:[ntdev] 4K buffer memory alignment in ataport

Hw designers don’t commit unforgivable mistakes.
It’s always the fault of driver folks to patch their buggy stuff, and do it quickly :frowning:

Regards,
–PA


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

__________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr!

http://www.flickr.com/gift/

>0) The bug is dumb.

Anyway it is easy to work around.

The driver needs anyway to convert the SGL entries from Windows format to the hardware format. For drivers like SCSI controller miniports, this code (as also init code) is the most of the bulk.

So, this conversion will just note the Windows SGL entries which cross the page boundary and split them to 2 hardware SGL entries. This is not hard at all.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

I did not say it’s hard. I SAID IT’S UGLY. it’s indeed easy to workaround.

Depending on the particular hw implementation, SGL could either be programmed into hardware (means hw have finite space for it) or reside in host memory (hw initiate a DMA to fetch it upon receiving commands from the host).

For case a, you need extra MWr target cycles over the bus,?you are at the mercy of the chipset to flush the post buffer as soon as possible. (let alone the case that you are so lucky that you ran out of SGL.)

For case b, fetching extra?SGL is not free. Have you ever looked bus?transactions?on a PCIe or PCI or PCIx bus analyzer?

For slow ata or scsi adapters, these overheads maybe acceptable. It’s all about latency and efficience. I’m?currently making my living of?supporting multi-port 10gpbs converged nic?where every microsecond or even nanosecond counts.?


Calvin Guan
Broadcom Corp.
Connecting Everything(r)

?

----- Original Message ----
From: Maxim S. Shatskih
To: Windows System Software Devs Interest List
Sent: Sunday, December 21, 2008 4:22:44 AM
Subject: Re:[ntdev] 4K buffer memory alignment in ataport

>0) The bug is dumb.

Anyway it is easy to work around.

The driver needs anyway to convert the SGL entries from Windows format to the hardware format. For drivers like SCSI controller miniports, this code (as also init code) is the most of the bulk.

So, this conversion will just note the Windows SGL entries which cross the page boundary and split them to 2 hardware SGL entries. This is not hard at all.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

__________________________________________________________________
Instant Messaging, free SMS, sharing photos and more… Try the new Yahoo! Canada Messenger at http://ca.beta.messenger.yahoo.com/

And you said, in your earlier note that being an electrical engineer is not
perhaps wise to work for a sw company …

My brain works exactly the opposite way … The classification of engieers
:: “There are basically two types of engineers; Good or Bad”. That’s it …

It’s your choice to call whatever you like, but your posts signals me that
“You are a damn GOOOOOOD engineer”.

Well, as if I’ve to sign something !
Prokash Sinha
http://prokash.squarespace.com
Success has many fathers, but failure is an orphan.

----- Original Message -----
From: “Calvin Guan”
To: “Windows System Software Devs Interest List”
Sent: Sunday, December 21, 2008 8:38 AM
Subject: Re: [ntdev] 4K buffer memory alignment in ataport

I did not say it’s hard. I SAID IT’S UGLY. it’s indeed easy to workaround.

Depending on the particular hw implementation, SGL could either be
programmed into hardware (means hw have finite space for it) or reside in
host memory (hw initiate a DMA to fetch it upon receiving commands from the
host).

For case a, you need extra MWr target cycles over the bus, you are at the
mercy of the chipset to flush the post buffer as soon as possible. (let
alone the case that you are so lucky that you ran out of SGL.)

For case b, fetching extra SGL is not free. Have you ever looked bus
transactions on a PCIe or PCI or PCIx bus analyzer?

For slow ata or scsi adapters, these overheads maybe acceptable. It’s all
about latency and efficience. I’m currently making my living of supporting
multi-port 10gpbs converged nic where every microsecond or even nanosecond
counts.


Calvin Guan
Broadcom Corp.
Connecting Everything(r)

----- Original Message ----
From: Maxim S. Shatskih
To: Windows System Software Devs Interest List
Sent: Sunday, December 21, 2008 4:22:44 AM
Subject: Re:[ntdev] 4K buffer memory alignment in ataport

>0) The bug is dumb.

Anyway it is easy to work around.

The driver needs anyway to convert the SGL entries from Windows format to
the hardware format. For drivers like SCSI controller miniports, this code
(as also init code) is the most of the bulk.

So, this conversion will just note the Windows SGL entries which cross the
page boundary and split them to 2 hardware SGL entries. This is not hard at
all.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

__________________________________________________________________
Instant Messaging, free SMS, sharing photos and more… Try the new Yahoo!
Canada Messenger at http://ca.beta.messenger.yahoo.com/


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer