Interesting point you brought up, specially in the caching area !!!
As of now there is a large amount of effort going on in that area. Mainly
Cache- caching
RAM- caching
Disk - caching
Cache being primary or secondary, is expensive, hence in the kilobytes range
! And being the oldest form of cache is well understood (X-way
associativity ) size, management of it etc. etc.
RAM cache being cheap is large, and in the MB or GB order, possibly can
layout my DNA
Disk cache is in between, some MBs.
RAM and disk caching being bit more large-spaced are the targets for current
efforts. Wide variety of access patterns are being observed, and control
logic is becoming ever more sophisticated. IF YOU HAVE SEEN OR HARD hiccup
probability then you know :). And Iâm trying to use some of the Arsenal I
learnt long time back, lucky indeed. I thought we were done with cache, but
it seems like there is always âthought for foodââŚ
My apology for being offtopic, but this is one heck of an intellectual set
of hackers :-).
-pro
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of
xxxxx@3Dlabs.com
Sent: Friday, May 14, 2004 3:32 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Object oriented programming in NDIS
hi, i am new to programming NDIS driversâŚcan you tell me if
i can use OO
programming, what are the disadvantages/advantages of doing
that. Also what
will be the fastest data structure to use for drivers, can we use
associative binary trees? Thanks for your help
You can use OO in drivers, but there are a few limitations, as well as
complicatins. For instance, itâs not possible to have âglobalâ objects. All
objects have to be created dynamically and at runtime.
You will also have to roll your own of some of the runtime library, as itâs
not supported by MS for the driver libraries. For instance, youâll need to
create a pair of ânewâ & âdeleteâ functions.
Of course, as Guy suggested, you can use DriverStudio from NuMega to get
started.
Or you can use plain C.
Weâve got a large simulator written in C++ which is used to simulate our
next generation graphics processor, which interfaces to plain C driver that
we have. Thereâs been times when Iâve thought âThis would be much easier to
do in C++â in the driver, but our driver is several dozen files, and a lot
of it is fairly complex, so it would take quite a lot of work to re-engineer
it to C++ all the way through, and if you donât do it properly, itâs not
much point.
Iâd say that the advantages of C is:
- Itâs directly interfacing to the DDK, you donât need any âinterface
layersâ. - I personally think itâs easier for the programmer to realize what the code
does.
The advantages of C++ would be:
- Generic classes, for encapsulation and code reuse.
- Better type checking.
Of course you can use associative binary trees. Itâs generally more popular
to use hash-tables or other methods that are more âdirectly accessibleâ, but
it all depends on how much data youâre trying to store/retrieve, and what
the access patterns are, etc, etc. All algorithms for storing and retrieving
data is dependant on how you access the data, and how much data is being
stored.
Also, if youâre going to access a lot of the data at short intervals, and
want the cache to help the performance of the driver, you may want to
consider something that isnât a linked list (which binary trees is a form
of). Linked lists are perfect for thrashing in the cache, as each entry is
very likely to be on a different cache-line. Consider the following code:
struct x
{
struct x *next;
struct y *data;
} *list;
struct y
{
int key;
⌠some more data
};
struct y *findKey(int key)
{
struct x *p;
for(p = list; p; p = p->next)
{
if (p->data.key == key)
return p->data;
}
return NULL;
}
Now, itâs very likely that each allocation of struct x is on a different
cache-line than the previous one. Also, if data is not allocated closely to
struct x, itâs very likely to not be on the same cache-line as the owning
struct x. So every time through the loop, we fetch TWO cache-lines. In order
to get two integer values. On a modern processor that means reading 128 or
256 bytes each time. And the only time the extra data fetched is of some
help is on the final one where we find the correct key. All the others are
most likely just polluting the cache.
Of course, a slightly different (less generic) way of structuring the data
as a single struct would help a lot here. The above example would be very
likely to be the generic C++ implementation of a linked list, while a C
implementation may well use a single struct for the data and the link(s),
and that would help a fair bit (half the memory bandwidth needed). Another
way would be to copy the key from the data to the struct x, so that itâs
easy to access without the extra dererferenc.
But a much better option would be to have a hash-table where we can find the
object based on the key data. If done correctly, this would cause no memory
access during the lookup phase.
Itâs important to remember that modern processors have a very high
throughput of instructions, but can easily be slowed down to a crawl by
accessing memory âbadlyâ. The latency to read something that isnât in
memory, on an Opteron/Athlon64 is about 45 ns. If the processor runs at
2GHz, and has 3 execution units, it will do 3 * 2 * 45 instructions in that
time. So even if you have to do a long instruction such as divide or
multiply, itâs a lot quicker than an uncached memory operation. Of course,
if the data is already in L1 cache, thatâs a completely different story.
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@garlic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com