Hi , everyone, I am new to Windows driver development and previously have only experience in application software development. As a result, my knowledge of hardware is quite limited.
My boss has tasked me with writing a production-ready Windows driver for a PCIe device within months. Fortunately, I found a sample PCIe driver in the SDK package provided by the device manufacturer. This sample driver includes basic interrupt and DMA functionality and currently succeeds in simple data communication.
However, there is still a significant gap between the current implementation and my boss's requirements. For example, the DMA in the sample only supports single transfers of up to 64KB based on a common buffer. And they need high performance driver.
I have encountered numerous challenges during the upgrade process, and unfortunately, no one in my team can assist me. I feel a lot of pressure, especially when dealing with hardware-related descriptions. I spend most of my time searching the internet for answers and even reached out to "friends of friends of friends" who might have relevant knowledge, but with little success.
I have compiled a list of questions, and I would greatly appreciate any answers or learning suggestions:
DMA Upgrade to SGDMA: According to the MSDN, "Scatter/gather capability refers to a flag in the device description that indicates the device can read or write from any area in memory." I tried to find this information in my PCIe device's technical manual but only learned that "our device's PCIe sub-system does not have built-in DMA capability." I then searched for information on the PCH on the motherboard, but found nothing useful. Where should I look for these device hardware descriptions related to DMA? Alternatively, does Windows abstract these hardware details, allowing me to obtain the necessary hardware capabilities or info through software means, such as querying the PCI bus driver (e.g., using APIs like WdfDeviceGetAlignmentRequirement)?
Neccessary Hardware Knowledge learning: I feel lost in my understanding of hardware, but I recognize that driver development without hardware knowledge can be disastrous. Could you recommend any learning resources or provide some basic hardware learning advice to guide my studies?
Thank you very much! Your advice could save me from despair
What kind of device is this? When you say high performance, what level of performance are you expecting? What kind of UM application will use this driver? Does it provide a standard interface that is expected to be used by an arbitrary application, or a custom interface that will be used only by your custom application?
Presumably you device has a specification. That will tell you what kind of DMA it supports and how to program it to perform a transfer from system memory to device memory or vise versa. If it supports scatter gather, then you might be able to use it to help you with large transfers (multiple pages). If it only supports DMA from a contiguous buffer, then you can decide how to handle it from a few strategies. And if it does not support DMA at all, then you have to use the CPU to do the copy.
It sounds like you device falls into that last category, so spending any more time on DMA would be a waste of time. The entire point of DMA is to get the device to copy the memory itself instead of using the CPU to do the copy. If the device does not have the ability to do that, then there is nothing that you can do
I am not affiliated with OSR (the owners of this website), and have no direct experience, but they do offer training. You might consider that
Sorry for the late response and thank you so much for your advice; I am indeed considering participating in this training.
What kind of device is this? When you say high performance, what level of performance are you expecting? What kind of UM application will use this driver? Does it provide a standard interface that is expected to be used by an arbitrary application, or a custom interface that will be used only by your custom application?
This project involves an industrial control board based on the Texas Instruments AM6442, which features a PCIe high-speed interface operating as an EP. The Windows driver for this device will be used exclusively for our specific application. We expect minimal jitter in the device's response time to commands from the host PC — I am aware that this is challenging to guarantee on Windows, so we may consider solutions like real-time Windows API or add-ons such as Kithara, but only in the future.
Since this is an experimental project and command responses are more frequent than data read/write operations, we do not have a clear plan for high-performance DMA requirements. My boss keeps emphasizing "high performance," but there isn't a concrete definition of what that entails.
It sounds like you device falls into that last category, so spending any more time on DMA would be a waste of time. The entire point of DMA is to get the device to copy the memory itself instead of using the CPU to do the copy. If the device does not have the ability to do that, then there is nothing that you can do.
The am6442 does come with a specification, but as mentioned earlier, its PCIe subsystem has no built-in DMA support. What confuses me is that the sample driver included in the device's SDK implements a basic DMA using a common buffer (it's a simple implementation using a DMA adapter and creating a common buffer, sharing data through this buffer to achieve DMA data transfer). Despite the PCIe subsystem not supporting built-in DMA, this DMA functions in the example driver runs very well and is significantly faster than copying data through BAR. So isn't it a DMA support?
My question is,
Can the PC directly initiate DMA data transfers regardless of whether the device's PCIe subsystem has built-in DMA support, similar to how this sample driver operates?
If yes, and the DMA demonstrated in the sample driver truly involves DMA data copying, then it suggests "there is something that I can do". How can I determine what I can do to optimize DMA? For instance, how can I find out if I can implement Scatter/Gather DMA? If implementing SGDMA is not feasible, what other optimizations can I consider?
In virtually every case, when we say "DMA" in reference to PCI or PCIe, we mean DMA initiated by the device. The PCIe spec calls this "bus mastering".
The fact that this device can act as either a root complex or an endpoint tells me that it must understand bus mastering. You're using it as an endpoint. Section 12.2.2.4.5 of the Technical Reference Manual even mentions the DMA support as an endpoint.
I cannot imagine trying to work with a document that has a reference manual running more than 10,400 pages.
Thank you for your advice. I had noticed this section in the manual before, but since I started learning from the perspective of Windows driver development, I didn't interpret this information in a way that was useful for developing DMA in drivers. People always prefer direct descriptions, such as "this device supports SGDMA," and "this device's alignment requirement is 32-bit."
Your suggestions have prompted me to re-read it carefully.
Section 12.2.2.4.5.1 PCIe Subsystem DMA Support The PCIe subsystem does not have DMA capabilities built into it. It has internal target and controller ports connected to the device-level interconnect.
An external DMA engine can make burst data read/writes on the target port and the controller port on PCIe subsystem can initiate reads/writes to memory on behalf of a remote PCIe device.
The PCIe subsystem does not specify the DMA protocol that is used for data transfers. The software implementations on the two ends of the PCIe link implement a data transfer protocol that is compatible with each other....
This explanation clarifies that "no built-in DMA capability" likely means the pcie subsystem uses a DMA controller within the device itself, rather than the DMA controller within the PCIe subsystem. Therefore, I need to look for answers within the DMA sections in the manual.
Is my understanding correct? If no, then where might the internal controller port of the PCIe subsystem be connected? Specifically, which DMA controller is actually responsible for the DMA data transfers between the PCIe device and the PC?
I overlooked the DMA descriptions specific to the device because my colleague (maybe he was wrong) previously told me that DMA over PCIe should only be described in the PCIe subsystem section, while other DMA description sections are only dedicated to communication for other parts of the device, for example, for the device host processors communicating with other peripheral hardware.
Before attending the OSR training, I believe it would be beneficial for me to take some courses on PCIe to better prepare myself.
The is the most complicated way I can imagine of saying that PCI is not like ISA and does not have any central DMA hardware. Instead each device that wants to do DMA needs its own DMA hardware.
If you have an example that does DMA, then your device clearly supports it. Using a common buffer is a standard technique. If your device has strict requirements for memory alignment, or can't do scatter / gather, it may be your only choice.
Given that this is some kind of industrial control device, probably you will be more concerned about the latency of device interactions rather than the throughput, so the extra memory copy probably isn't of much concern
Based on provided information, you (or your boss) should immediately look for a consultant to verify the hardware design against requirements and find a quick fix if possible. Otherwise you're heading into a wreck.
This thing has three GBE interfaces and high speed USB device. Why they decided to use PCIe to communicate with the PC?
If they insist on PCIe... bridge chips exist that may add bus-mastering DMA functionality.
The point of view here makes for confusing discussions. The AM6442 is itself a full computer, with a PCIe root complex that can host its own array of external devices. When acting in that capacity, it certainly does not support bus DMA. No PCIe root complexes did that until quite recently, and even then it is sparsely used.
However, its PCIe subsystem can also act as a device endpoint, where it gets plugged into someone else's root complex. I am assuming that it can support bus mastering when it acts as an endpoint, but I admit it is difficult to extract that from the documentation. MAYBE it is necessary to write software for the device that does those transfers. That requires more in-depth reading of the manual than I am willing to do.
It's a valid (and important) question. Do you have a manufacturer's rep who can answer questions?
Thank you all for your responses. I have carefully read, word by word, the sections in the manual concerning DMA support for the PCIe subsystem.
The PCIe subsystem does not have DMA capabilities built into it. It has internal target and controller ports connected to the device-level interconnect.
I indeed found a diagram (see attached pictures) that shows the target port and controller port in the PCIe subsystem are connected to CBASS0. According to the mannual, CBASS0 is described as "a crossbar module to provide physical connections among initiators and targets."
Then, I discovered that the device's Data Movement subsystem (the module within the device that provides DMA support) integrates support for CBASS and connects to other peripherals or subsystems of the device via CBASS (I assume this is where it connects to the PCIe subsystem). see another attached picture.
This might explain why the Windows driver example provided with this device actually has DMA functionality and works correctly, even though its PCIe subsystem indicates no built-in DMA support: it provides DMA support to the PCIe subsystem through its CBASS. Do you think this hypothesis is reasonable?
Thank you for your advice!
This is indeed a very important question. The device has a technical forum, but my colleague informed me that questions posted there hardly got answered. Therefore, I plan to try contacting the hardware technical support directly through my boss to get the answers we need.
I think you've pinpointed the key diagram there. As it says, the DMSS module has the components that provide packet DMA and block copy DMA. That's what you need. PCIe is just the plumbing. The DMA functionality comes from DMSS.
Now, what about PCIe configuration space? Can it provide the configuration space, with all the required registers - incl. the bit that enables DMA, configuration of interrupts?
This is only beginning of making a custom PCIe driver. Instead, they could talk with the host over ethernet and work-around all that mess.
Well, the thing specifically includes a reversible PCIe subsystem, so it was clearly designed to operate in that capacity.
The next step is to look at the Linux USB Gadget API. "Gadget" is the word used by Linux to refer to a machine that is operating as a PCIe endpoint, rather than a controller.
I think your far bigger issue is what sort of data you want to send to this device, and how you get this device to interpret it as the kind you want. Probably your driver is a minor part of this and simply needs to facilitate between your custom app and whatever is going to run on this device
The choice of PCIe over the network was mostly made based on conventions and some consideration of performance (I guess which is jitter of response latacy). Of course, we might decide to switch to using the ethernet one day .
Developing with PCIe is really complex; although the SDK provides an example of accessing a certain register in the PCIe configuration space via the bus interface to trigger downstream device intterrupt, it's incredibly difficult to find the other registers that need modification among countless hardware source codes and thousands of pages of manuals, especially as a novice developing windows drivers alone with poor technical support from the device manufacturer.
Well, no more complaints.
You mean I should look for APIs for running as a PCIe Endpoint (EP), rather than focusing on the spec for the DMA controller? Thank you for your advice. Over the past 2 days, I have been trying to search for the keywords "PCIe EP," "DMA," in the manual and PCI configuration space register description, but apart from the content mentioned earlier, I haven't made much progress. I suspect this might require more time.
You are right; I don't need to worry about this part of the work, someone else will be responsible for these. What I need to do is provide a data transfer channel that meets the requirements and is stable and high-performing!
If your boss has asked you to do this at all, he wants to use PCIe. His emphasis on performance backs that up
Both Ethernet and PCIe are 'networks'. PCIe is much more complex, but those complexities are handled by the hardware and combine to provide fast consistent and reliable delivery of signals and data. PCIe has a limited range (a few feet) versus Ethernet that can operate over 10's of miles (MPLS), but is comparatively slower and does not guarantee much.
Probably the driver design you want has a simple pair of IOCTLs - send and recv. And probably the sample driver you have looked at has 90% of what you need. A simple common buffer DMA approach will be dramatically faster than using Ethernet for example. And if your driver doesn't need to understand anything about the higher level protocol used, then there isn't much more that the KM code can contribute