Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 13-17 May 2024 | Live, Online |
Developing Minifilters | 1-5 Apr 2024 | Live, Online |
Internals & Software Drivers | 11-15 Mar 2024 | Live, Online |
Writing WDF Drivers | 26 Feb - 1 Mar 2024 | Live, Online |
Comments
> How to write simple and fast code (possibly assemlber - based on movsd/w/b
> or MMX move instructions?)
> for implememting accelerated DrvBitBlt and DrvCopyBlt operations.
>
> I.e. that I need universal procudure for blitting blocks in videomemory
> using direct framebuffer access.
>
> This is heavily needed for my project -
> http://www.geocities.com/bearwindows/vbemp.htm
>
> It is almost done. But there is a problem: when working in 1024x768
> resolution and higher my driver slowdowns screen move/copy/scroll operations
> ( Till now it uses framebuf.dll as usermode part of my driver.).
>
> As I know, frambuf.dll does only basic functions (some initializations,
> prepares and fills structures etc.)
>
> Other work is mainly done by GDI via EngXXX functions, I think. It is slow
>
>
You proceed on a false assumption. Yes, scrolling and blits are slow,
but they're not slow because the EngXxx callbacks are poorly optimized.
Rather, they're slow because you are reading and writing device memory.
The computer you are using now does fast blits because the copying is
being done by the graphics chip, while at the same time your processor
is moving on and doing something else. With your frame buffer driver,
you're moving all of those pixels around by hand, and you can't continue
on until the copy is done.
Scrolling a 1024x768 window up by one scanline requires 1,500 copies of
4k bytes each (because you have to read the scan from the graphics chip,
then write it back to the graphics chip). That's going to take at least
50ms, and probably more, because the "reading" operation in a graphics
chip is usually not well-optimized.
Starting with VBE 2.0, there is a VBE extension for doing bitblts
(AX=4F17). I don't know how many BIOSes implement that.
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.
I need to find really universal way to blit/copy.
***
Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?
Yes, that will be much faster than reading the video card's RAM, especially for cards on buses designed to write data to the card much faster than read from it.
Good luck!
Tim Green
http://www.displaylink.com/
> But if I use MMX move instructuons may be I've got *some* speed increase instead of using Engxxx callbacks?
>
Again, you make the assumption that the EngXxx callbacks do not already
do this. Microsoft employs a large number of very smart people. You
are unlikely to do better in the general case without a significant
investment.
> I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
> And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.
>
So what? All that means is that you check first. There is no magic
spell here. If you want something faster than simple bit-banging, you
need acceleration. If AX=4F17 is available, that's exactly what it is
intended for. If it is not available, then you have no other options.
> I need to find really universal way to blit/copy.
>
What you are doing now *IS* the universal way. It just so happens that
the universal way is slow for large blits.
> Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?
Possibly. The X.Org window system for Linux has an option to do its
generic frame buffer driver this way.
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
I don't believe storing the framebuffer in system memory is a good idea either. The way to get performance out of video hardware is to achieve synergy with its design - you write your software for the card, not for the OS. And on the same bus and the same OS, every card is different.
I would put a logic analyzer in that memory interface and see in detail what's really happening. I might also try to run the board on a VMetro and poke around the bus interface, again, at hw level. Alternatively, I would implement the blit as two texture-mapped triangles to see if I can get any insight on where performance is being lost. But still, I find hardware level debugging an invaluable help in this kind of situation!
Hope this helps,
Alberto.
----- Original Message -----
From: Tim Green
To: Windows System Software Devs Interest List
Sent: Wednesday, September 26, 2007 1:07 AM
Subject: RE: [ntdev] VBEMP: Fast accelerated DrvBitBlt
> Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?
Yes, that will be much faster than reading the video card's RAM, especially for cards on buses designed to write data to the card much faster than read from it.
Good luck!
Tim Green
http://www.displaylink.com/
---
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
Read dword *video*memory is about ~ 8 Mbytes/sec
Write dword *video*memory is about ~ 64 Mbytes/sec
Read dword *system*memory is about ~ 400 Mbytes/sec
Write dword *system*memory is about ~ 400 Mbytes/sec
Feel the difference
I must find a way to minimize read operations from *video*memory... Because it is 50 times SLOW
than system's.
I test my emulated driver on *system*memory. If is fast enough when I'm not using *video*memory at all.
> I'm a bit rusty on this, but my experience is that Bitblt is usually
> fast unless you implement Rops in software. My experience is also that
> intra-board blits are way, way faster than memory-to-board or
> board-to-memory blits.
>
> I don't believe storing the framebuffer in system memory is a good
> idea either. The way to get performance out of video hardware is to
> achieve synergy with its design - you write your software for the
> card, not for the OS. And on the same bus and the same OS, every card
> is different.
>
> I would put a logic analyzer in that memory interface and see in
> detail what's really happening. I might also try to run the board on a
> VMetro and poke around the bus interface, again, at hw level.
> Alternatively, I would implement the blit as two texture-mapped
> triangles to see if I can get any insight on where performance is
> being lost. But still, I find hardware level debugging an invaluable
> help in this kind of situation!
I think you have missed the point, Alberto. He's writing a generic
driver, frame buffer only. He doesn't have access to the graphics chip
-- all he has is a pointer to the pixels. The driver is supposed to
work with every graphics chip that supports VBE.
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
>>driver, frame buffer only. He doesn't have access to the graphics chip
>>-- all he has is a pointer to the pixels. The driver is supposed to
>>work with every graphics chip that supports VBE.
That's right. My driver is VBE-only oriented by design. I don't want to use each card's specific acceleration functions. I want to make a really universal solution for every NT-based OS.
memory interface and try to see what's going on. I would also run it under a
VMetro. Also, for my curiosity, why in this age of GPUs people still do
frame buffer drivers ?
Alberto.
----- Original Message -----
From: <[email protected]>
To: "Windows System Software Devs Interest List" <[email protected]>
Sent: Thursday, September 27, 2007 1:41 PM
Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt
>>>I think you have missed the point, Alberto. He's writing a generic
>>>driver, frame buffer only. He doesn't have access to the graphics chip
>>>-- all he has is a pointer to the pixels. The driver is supposed to
>>>work with every graphics chip that supports VBE.
> That's right. My driver is VBE-only oriented by design. I don't want to
> use each card's specific acceleration functions. I want to make a really
> universal solution for every NT-based OS.
>
> ---
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
This driver is good for:
1) Legacy operating system (NT3/NT4) + *NEW* videocards.
2) Reactos operating system support. (www.reactos.org)
3) Contemprorary operating system (2K/XP/2K3) + *OLD* or unsupported videocards.
4) Embedded systems like XP Embedded and Windows PE, BartPE
5) Universal solution for office pc use (w/out Direct3D/Video overlay of course, but containing basic drawing functions)
6) In ideal, my driver is trying to compete with this product: http://scitechsoft.com/products/ent/snap_main.html (it is ceased development from Novemver 2006 tiill now)
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Alberto Moreira
> Sent: 28 September 2007 02:10
> To: Windows System Software Devs Interest List
> Subject: Re: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt
>
> Also, for my curiosity, why in this age of GPUs people still
> do frame buffer drivers ?
Virtual GPUs. A straight forward frame buffer can be used with the heavy
lifting performed by punting GDI calls back to EngXXX.
In the XP Display Driver Model this is the system we use at DisplayLink,
and it seems to work well for UltraVNC, MaxiVista and Microsoft's Remote
Desktop application too.
Tim Green
Development Engineer
DisplayLink (UK) Limited
http://www.displaylink.com/
> (AX=4F17). I don't know how many BIOSes implement that.
> Tim Roberts, [email protected]
> Providenza & Boekelheide, Inc.
Where I can get info about this function ???
Please, help!!!
Alberto.
----- Original Message -----
From: <[email protected]>
To: "Windows System Software Devs Interest List" <[email protected]>
Sent: Friday, September 28, 2007 12:52 AM
Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt
> To Alberto and others:
> First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
> This driver is good for:
> 1) Legacy operating system (NT3/NT4) + *NEW* videocards.
> 2) Reactos operating system support. (www.reactos.org)
> 3) Contemprorary operating system (2K/XP/2K3) + *OLD* or unsupported
> videocards.
> 4) Embedded systems like XP Embedded and Windows PE, BartPE
> 5) Universal solution for office pc use (w/out Direct3D/Video overlay of
> course, but containing basic drawing functions)
> 6) In ideal, my driver is trying to compete with this product:
> http://scitechsoft.com/products/ent/snap_main.html (it is ceased
> development from Novemver 2006 tiill now)
>
> ---
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>> Starting with VBE 2.0, there is a VBE extension for doing bitblts
>> (AX=4F17). I don't know how many BIOSes implement that.
>>
> Where I can get info about this function ???
> Please, help!!!
Google is a much more efficient search mechanism than this mailing
list. After all, that's how I found that function in the first place.
http://www.vesa.org/public/VBE/VBE-AF07.pdf
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
I'll already have this document (VBE-AF07.pdf). In this document VESA fn's not defined. But where I can find 4F17 VESA fn detailed info?
> To Tim Roberts
> I'll already have this document (VBE-AF07.pdf). In this document VESA fn's not defined. But where I can find 4F17 VESA fn detailed info?
>
Well, now I can't find the reference again. I'll keep looking.
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer
Such mechanism, I think, made by Microsoft in Windows 2000/XP/2003 in Win32k.sys when you set: Control Panel -> Display -> Settings tab -> Advanced button -> Troubleshooting tab and move Hardware acceleration slider to the left position (None). In this mode acceleration is acheived by VRAM buffering.
First we have such speeds (for example):
(I don't use any kind of videoadapter acceleration - only assembler reads)
Read dword *video*memory is about ~ 8 Mbytes/sec
Write dword *video*memory is about ~ 64 Mbytes/sec
Read dword *system*memory is about ~ 400 Mbytes/sec
Write dword *system*memory is about ~ 400 Mbytes/sec
1) Normal operation - framebufer in *video*memory (without write combining / USWC)
Read ~ 8 Mb/s
Write ~64 Mb/s
A-A-A!!!! Sloooooooooooooooooooooooow
2) Then I place framebufer in *system*memory
Read ~ 400 Mb/s
Write ~400 Mb/s
But that's not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)
So:
Read ~ 400 Mb/s
Write ~400 Mb/s (real write)
Write ~64 Mb/s (duplicate bitmap onto screen)
It is much faster than 1). Good work.
When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).
Comments are welcome.
> I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer
> ...
> First we have such speeds (for example):
> (I don't use any kind of videoadapter acceleration - only assembler reads)
>
> Read dword *video*memory is about ~ 8 Mbytes/sec
> Write dword *video*memory is about ~ 64 Mbytes/sec
> Read dword *system*memory is about ~ 400 Mbytes/sec
> Write dword *system*memory is about ~ 400 Mbytes/sec
> ...
> 2) Then I place framebufer in *system*memory
> ...
> But that's not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)
> ...
> It is much faster than 1). Good work.
>
> When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).
>
> Comments are welcome.
>
Yes, I believe this was the solution that was proposed to you a week or
so ago. This same solution is used by XFree86/X.Org on Linux; they call
it a "shadow buffer". It is the mechanism they use to rotate the screen
90 degrees at a time.
--
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.