VBEMP: Fast accelerated DrvBitBlt

How to write simple and fast code (possibly assemlber - based on movsd/w/b
or MMX move instructions?)
for implememting accelerated DrvBitBlt and DrvCopyBlt operations.

I.e. that I need universal procudure for blitting blocks in videomemory
using direct framebuffer access.

This is heavily needed for my project -
http://www.geocities.com/bearwindows/vbemp.htm

It is almost done. But there is a problem: when working in 1024x768
resolution and higher my driver slowdowns screen move/copy/scroll operations
( Till now it uses framebuf.dll as usermode part of my driver.).

As I know, frambuf.dll does only basic functions (some initializations,
prepares and fills structures etc.)

Other work is mainly done by GDI via EngXXX functions, I think. It is slow
:frowning:

Regards, bw.

OSR Online wrote:

How to write simple and fast code (possibly assemlber - based on movsd/w/b
or MMX move instructions?)
for implememting accelerated DrvBitBlt and DrvCopyBlt operations.

I.e. that I need universal procudure for blitting blocks in videomemory
using direct framebuffer access.

This is heavily needed for my project -
http://www.geocities.com/bearwindows/vbemp.htm

It is almost done. But there is a problem: when working in 1024x768
resolution and higher my driver slowdowns screen move/copy/scroll operations
( Till now it uses framebuf.dll as usermode part of my driver.).

As I know, frambuf.dll does only basic functions (some initializations,
prepares and fills structures etc.)

Other work is mainly done by GDI via EngXXX functions, I think. It is slow
:frowning:

You proceed on a false assumption. Yes, scrolling and blits are slow,
but they’re not slow because the EngXxx callbacks are poorly optimized.
Rather, they’re slow because you are reading and writing device memory.

The computer you are using now does fast blits because the copying is
being done by the graphics chip, while at the same time your processor
is moving on and doing something else. With your frame buffer driver,
you’re moving all of those pixels around by hand, and you can’t continue
on until the copy is done.

Scrolling a 1024x768 window up by one scanline requires 1,500 copies of
4k bytes each (because you have to read the scan from the graphics chip,
then write it back to the graphics chip). That’s going to take at least
50ms, and probably more, because the “reading” operation in a graphics
chip is usually not well-optimized.

Starting with VBE 2.0, there is a VBE extension for doing bitblts
(AX=4F17). I don’t know how many BIOSes implement that.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

But if I use MMX move instructuons may be I’ve got *some* speed increase instead of using Engxxx callbacks?
I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.
I need to find really universal way to blit/copy.

***

Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

> Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

Yes, that will be much faster than reading the video card’s RAM, especially for cards on buses designed to write data to the card much faster than read from it.

Good luck!
Tim Green
http://www.displaylink.com/

xxxxx@operamail.com wrote:

But if I use MMX move instructuons may be I’ve got *some* speed increase instead of using Engxxx callbacks?

Again, you make the assumption that the EngXxx callbacks do not already
do this. Microsoft employs a large number of very smart people. You
are unlikely to do better in the general case without a significant
investment.

I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.

So what? All that means is that you check first. There is no magic
spell here. If you want something faster than simple bit-banging, you
need acceleration. If AX=4F17 is available, that’s exactly what it is
intended for. If it is not available, then you have no other options.

I need to find really universal way to blit/copy.

What you are doing now *IS* the universal way. It just so happens that
the universal way is slow for large blits.

Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

Possibly. The X.Org window system for Linux has an option to do its
generic frame buffer driver this way.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Ok, guys, my method is slow. But WHY my driver redraw and scroll *VERY* fast when I set Hardware acceleration slider to “None” in ControlPanel>Display>Settings>Advanced>TroubleShooting? What’s the trick? (This works in Windows 2000 and later.)

RE: [ntdev] VBEMP: Fast accelerated DrvBitBltI’m a bit rusty on this, but my experience is that Bitblt is usually fast unless you implement Rops in software. My experience is also that intra-board blits are way, way faster than memory-to-board or board-to-memory blits.

I don’t believe storing the framebuffer in system memory is a good idea either. The way to get performance out of video hardware is to achieve synergy with its design - you write your software for the card, not for the OS. And on the same bus and the same OS, every card is different.

I would put a logic analyzer in that memory interface and see in detail what’s really happening. I might also try to run the board on a VMetro and poke around the bus interface, again, at hw level. Alternatively, I would implement the blit as two texture-mapped triangles to see if I can get any insight on where performance is being lost. But still, I find hardware level debugging an invaluable help in this kind of situation!

Hope this helps,

Alberto.

----- Original Message -----
From: Tim Green
To: Windows System Software Devs Interest List
Sent: Wednesday, September 26, 2007 1:07 AM
Subject: RE: [ntdev] VBEMP: Fast accelerated DrvBitBlt

Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

Yes, that will be much faster than reading the video card’s RAM, especially for cards on buses designed to write data to the card much faster than read from it.

Good luck!
Tim Green
http://www.displaylink.com/


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Ok. In short - for example, we have a Pentium III PC with ATI Card:
Read dword *video*memory is about ~ 8 Mbytes/sec
Write dword *video*memory is about ~ 64 Mbytes/sec
Read dword *system*memory is about ~ 400 Mbytes/sec
Write dword *system*memory is about ~ 400 Mbytes/sec

Feel the difference :slight_smile:

I must find a way to minimize read operations from *video*memory… Because it is 50 times SLOW
than system’s.

I test my emulated driver on *system*memory. If is fast enough when I’m not using *video*memory at all.

Alberto Moreira wrote:

I’m a bit rusty on this, but my experience is that Bitblt is usually
fast unless you implement Rops in software. My experience is also that
intra-board blits are way, way faster than memory-to-board or
board-to-memory blits.

I don’t believe storing the framebuffer in system memory is a good
idea either. The way to get performance out of video hardware is to
achieve synergy with its design - you write your software for the
card, not for the OS. And on the same bus and the same OS, every card
is different.

I would put a logic analyzer in that memory interface and see in
detail what’s really happening. I might also try to run the board on a
VMetro and poke around the bus interface, again, at hw level.
Alternatively, I would implement the blit as two texture-mapped
triangles to see if I can get any insight on where performance is
being lost. But still, I find hardware level debugging an invaluable
help in this kind of situation!

I think you have missed the point, Alberto. He’s writing a generic
driver, frame buffer only. He doesn’t have access to the graphics chip
– all he has is a pointer to the pixels. The driver is supposed to
work with every graphics chip that supports VBE.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>>I think you have missed the point, Alberto. He’s writing a generic

>driver, frame buffer only. He doesn’t have access to the graphics chip
>– all he has is a pointer to the pixels. The driver is supposed to
>work with every graphics chip that supports VBE.
That’s right. My driver is VBE-only oriented by design. I don’t want to use each card’s specific acceleration functions. I want to make a really universal solution for every NT-based OS.

Yes, I misunderstood the issue. Still, I would put a logic analyzer at the
memory interface and try to see what’s going on. I would also run it under a
VMetro. Also, for my curiosity, why in this age of GPUs people still do
frame buffer drivers ?

Alberto.

----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Thursday, September 27, 2007 1:41 PM
Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt

>>>I think you have missed the point, Alberto. He’s writing a generic
>>>driver, frame buffer only. He doesn’t have access to the graphics chip
>>>– all he has is a pointer to the pixels. The driver is supposed to
>>>work with every graphics chip that supports VBE.
> That’s right. My driver is VBE-only oriented by design. I don’t want to
> use each card’s specific acceleration functions. I want to make a really
> universal solution for every NT-based OS.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

To Alberto and others:
First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
This driver is good for:

  1. Legacy operating system (NT3/NT4) + *NEW* videocards.
  2. Reactos operating system support. (www.reactos.org)
  3. Contemprorary operating system (2K/XP/2K3) + *OLD* or unsupported videocards.
  4. Embedded systems like XP Embedded and Windows PE, BartPE
  5. Universal solution for office pc use (w/out Direct3D/Video overlay of course, but containing basic drawing functions)
  6. In ideal, my driver is trying to compete with this product: http://scitechsoft.com/products/ent/snap_main.html (it is ceased development from Novemver 2006 tiill now)

> -----Original Message-----

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
Alberto Moreira
Sent: 28 September 2007 02:10
To: Windows System Software Devs Interest List
Subject: Re: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt

Also, for my curiosity, why in this age of GPUs people still
do frame buffer drivers ?

Virtual GPUs. A straight forward frame buffer can be used with the heavy
lifting performed by punting GDI calls back to EngXXX.

In the XP Display Driver Model this is the system we use at DisplayLink,
and it seems to work well for UltraVNC, MaxiVista and Microsoft’s Remote
Desktop application too.

Tim Green
Development Engineer
DisplayLink (UK) Limited
http://www.displaylink.com/

> Starting with VBE 2.0, there is a VBE extension for doing bitblts

(AX=4F17). I don’t know how many BIOSes implement that.
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Where I can get info about this function ???
Please, help!!!

Thanks! I didn’t know that this was still alive after all these years.

Alberto.

----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Friday, September 28, 2007 12:52 AM
Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt

> To Alberto and others:
> First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
> This driver is good for:
> 1) Legacy operating system (NT3/NT4) + NEW videocards.
> 2) Reactos operating system support. (www.reactos.org)
> 3) Contemprorary operating system (2K/XP/2K3) + OLD or unsupported
> videocards.
> 4) Embedded systems like XP Embedded and Windows PE, BartPE
> 5) Universal solution for office pc use (w/out Direct3D/Video overlay of
> course, but containing basic drawing functions)
> 6) In ideal, my driver is trying to compete with this product:
> http://scitechsoft.com/products/ent/snap_main.html (it is ceased
> development from Novemver 2006 tiill now)
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

xxxxx@operamail.com wrote:

> Starting with VBE 2.0, there is a VBE extension for doing bitblts
> (AX=4F17). I don’t know how many BIOSes implement that.
>
Where I can get info about this function ???
Please, help!!!

Google is a much more efficient search mechanism than this mailing
list. After all, that’s how I found that function in the first place.

http://www.vesa.org/public/VBE/VBE-AF07.pdf


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

To Tim Roberts
I’ll already have this document (VBE-AF07.pdf). In this document VESA fn’s not defined. But where I can find 4F17 VESA fn detailed info?

xxxxx@operamail.com wrote:

To Tim Roberts
I’ll already have this document (VBE-AF07.pdf). In this document VESA fn’s not defined. But where I can find 4F17 VESA fn detailed info?

Well, now I can’t find the reference again. I’ll keep looking.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hi there!
I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer

Such mechanism, I think, made by Microsoft in Windows 2000/XP/2003 in Win32k.sys when you set: Control Panel -> Display -> Settings tab -> Advanced button -> Troubleshooting tab and move Hardware acceleration slider to the left position (None). In this mode acceleration is acheived by VRAM buffering.

First we have such speeds (for example):
(I don’t use any kind of videoadapter acceleration - only assembler reads)

Read dword *video*memory is about ~ 8 Mbytes/sec
Write dword *video*memory is about ~ 64 Mbytes/sec
Read dword *system*memory is about ~ 400 Mbytes/sec
Write dword *system*memory is about ~ 400 Mbytes/sec

  1. Normal operation - framebufer in *video*memory (without write combining / USWC)

Read ~ 8 Mb/s
Write ~64 Mb/s

A-A-A!!! Sloooooooooooooooooooooooow :frowning:

  1. Then I place framebufer in *system*memory

Read ~ 400 Mb/s
Write ~400 Mb/s

But that’s not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)

So:

Read ~ 400 Mb/s
Write ~400 Mb/s (real write)
Write ~64 Mb/s (duplicate bitmap onto screen)

It is much faster than 1). Good work.

When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).

Comments are welcome.

xxxxx@operamail.com wrote:

I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer

First we have such speeds (for example):
(I don’t use any kind of videoadapter acceleration - only assembler reads)

Read dword *video*memory is about ~ 8 Mbytes/sec
Write dword *video*memory is about ~ 64 Mbytes/sec
Read dword *system*memory is about ~ 400 Mbytes/sec
Write dword *system*memory is about ~ 400 Mbytes/sec

2) Then I place framebufer in *system*memory

But that’s not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)

It is much faster than 1). Good work.

When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).

Comments are welcome.

Yes, I believe this was the solution that was proposed to you a week or
so ago. This same solution is used by XFree86/X.Org on Linux; they call
it a “shadow buffer”. It is the mechanism they use to rotate the screen
90 degrees at a time.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.