Common Buffer Vs Discrete memory

Hi Guys,
I just want to ask a very simple question. I am preparing a NVME scatter gather list. For a user mode buffer with M Pages, this scatter gather list looks like this:-

[NVME Segment 0 ]                                                       [NVME Segment 1]                                                              [NVME Seg 2]
  {Data Descriptor Page 0}                                          {Data Descriptor Page 255}                                    {Data Descriptor Page N+1}
  {Data Descriptor Page 1}                                          {Data Descriptor Page 256}                                    {Data Descriptor Page N+2}
      :
      :
 {Data Descriptor Page 254}                                       {Data Descriptor Page N}                                         {Last Data Descriptor Page M}   
 {Seg 1 Descriptor}                                                      {Last Seg  Descriptor For Seg2}                           

Basically :
Each NVME Segment is one page. This page can have three things in it

              * Data Descriptor  OR
              * Segment Descriptor
              * Last Segment Descriptor

Each Segment (or page) can have 255 entries.

To create such a SGL I am allocating a common buffer which is enough to contain all (M : From above picture) data descriptors.

Theoretically the actual data transfer can be in GBs. So lets say I need 4 MB of Common buffer which is ~1024 common buffer pages. There can be many IOs at the same time pending in the driver.

Right now I have a common buffer allocation for this. But I was just wondering if it would be better to allocate separate pages, lock them down and prepare the scatter gather in those pages?

I know that the common buffer memory is also not physically contiguous. It is a page mapped memory both from the host and the device side (using IOMMU).
Are there any limitations of using common buffer in such a case?

Thanks,
-AJ

I know that the common buffer memory is also not physically contiguous

No, you don’t. Common buffer memory is, by definition, physically contiguous. That’s its purpose.

Why are you doing your own NVMe driver? That’s a solved problem.

I was just wondering if it would be better to allocate separate pages, lock them down and prepare the scatter gather in those pages?

Why would that possibly be “better”?

I know that the common buffer memory is also not physically contiguous

You DO? In fact, by implementation it is physically contiguous. But, no matter. It’s contiguous for the purposes of DMA… and THAT’s all the matters for your use, right?

I feel like I’m missing something important in your question,

Peter

@Tim_Roberts : Why are you doing your own NVMe driver?
My mistake to say it like that. My device has a NVMe interface but it is not a NVMe device per se.

@“Peter_Viscarola_(OSR)” : I feel like I’m missing something important in your question,
I do not think that you are missing anything. I described the question completely wrong :slight_smile:

Let me rephrase.

When I allocate a Common buffer I get two addresses one on the host side and another on the “bus” side. Both the address ranges are contiguous for their purposes.

I am just wondering that since the NVMe provides us with a way to chain different physical addresses for “Scattered” SGL, are we taking advantage of this feature by allocating a Common buffer (which is physically contiguous)??

Or the better way would be to allocate a single page at a time and use it to build one SGL segment, then allocate another page and build the next segment and so on.

I am guessing that both the approaches should be equivalent. But not sure.

  • Aj

Hmmm… we almost always allocate the CONTROL structures via a Common Buffer; Whether the buffers that hold the DATA come from a Common Buffer or someplace else, is entirely up to your design. Of course, if you put the data buffers in a Common Buffer, devising your Scatter/Gather list is pretty trivial. OTOH, if you allocate your data buffers in… something else… how are you going to get those mapped properly by the IOMMU? You’ll have to run those through the Windows DMA API. And, unless you want to leave the allocation of those buffers to a user-mode caller (particularly one using a Packet-Based scheme), you’re really just making a lot of work for yourself, don’t you think?

I mean… It’s all up to you. Like I said before: There’s a lot involved in the design of your solution. Choosing the right architecture/design is the hard part. Once you’ve done that, then, well… the implementation just naturally follows on.

Peter