Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

VBEMP: Fast accelerated DrvBitBlt

bearwindowsbearwindows Member Posts: 9
How to write simple and fast code (possibly assemlber - based on movsd/w/b
or MMX move instructions?)
for implememting accelerated DrvBitBlt and DrvCopyBlt operations.

I.e. that I need universal procudure for blitting blocks in videomemory
using direct framebuffer access.

This is heavily needed for my project -
http://www.geocities.com/bearwindows/vbemp.htm

It is almost done. But there is a problem: when working in 1024x768
resolution and higher my driver slowdowns screen move/copy/scroll operations
( Till now it uses framebuf.dll as usermode part of my driver.).

As I know, frambuf.dll does only basic functions (some initializations,
prepares and fills structures etc.)

Other work is mainly done by GDI via EngXXX functions, I think. It is slow
:(

Regards, bw.

Comments

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    OSR Online wrote:
    > How to write simple and fast code (possibly assemlber - based on movsd/w/b
    > or MMX move instructions?)
    > for implememting accelerated DrvBitBlt and DrvCopyBlt operations.
    >
    > I.e. that I need universal procudure for blitting blocks in videomemory
    > using direct framebuffer access.
    >
    > This is heavily needed for my project -
    > http://www.geocities.com/bearwindows/vbemp.htm
    >
    > It is almost done. But there is a problem: when working in 1024x768
    > resolution and higher my driver slowdowns screen move/copy/scroll operations
    > ( Till now it uses framebuf.dll as usermode part of my driver.).
    >
    > As I know, frambuf.dll does only basic functions (some initializations,
    > prepares and fills structures etc.)
    >
    > Other work is mainly done by GDI via EngXXX functions, I think. It is slow
    > :(
    >

    You proceed on a false assumption. Yes, scrolling and blits are slow,
    but they're not slow because the EngXxx callbacks are poorly optimized.
    Rather, they're slow because you are reading and writing device memory.

    The computer you are using now does fast blits because the copying is
    being done by the graphics chip, while at the same time your processor
    is moving on and doing something else. With your frame buffer driver,
    you're moving all of those pixels around by hand, and you can't continue
    on until the copy is done.

    Scrolling a 1024x768 window up by one scanline requires 1,500 copies of
    4k bytes each (because you have to read the scan from the graphics chip,
    then write it back to the graphics chip). That's going to take at least
    50ms, and probably more, because the "reading" operation in a graphics
    chip is usually not well-optimized.

    Starting with VBE 2.0, there is a VBE extension for doing bitblts
    (AX=4F17). I don't know how many BIOSes implement that.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • bearwindowsbearwindows Member Posts: 9
    But if I use MMX move instructuons may be I've got *some* speed increase instead of using Engxxx callbacks?
    I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
    And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.
    I need to find really universal way to blit/copy.

    ***

    Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?
  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    > Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

    Yes, that will be much faster than reading the video card's RAM, especially for cards on buses designed to write data to the card much faster than read from it.

    Good luck!
    Tim Green
    http://www.displaylink.com/
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    xxxxx@operamail.com wrote:
    > But if I use MMX move instructuons may be I've got *some* speed increase instead of using Engxxx callbacks?
    >

    Again, you make the assumption that the EngXxx callbacks do not already
    do this. Microsoft employs a large number of very smart people. You
    are unlikely to do better in the general case without a significant
    investment.

    > I cannot use any king of harware acceleration because my universal driver is VESA/VBE oriented.
    > And also cannot use VBE 2.0 call AX=4F17 because some adapters does not supporting this calll.
    >

    So what? All that means is that you check first. There is no magic
    spell here. If you want something faster than simple bit-banging, you
    need acceleration. If AX=4F17 is available, that's exactly what it is
    intended for. If it is not available, then you have no other options.

    > I need to find really universal way to blit/copy.
    >

    What you are doing now *IS* the universal way. It just so happens that
    the universal way is slow for large blits.

    > Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

    Possibly. The X.Org window system for Linux has an option to do its
    generic frame buffer driver this way.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • bearwindowsbearwindows Member Posts: 9
    Ok, guys, my method is slow. But WHY my driver redraw and scroll *VERY* fast when I set Hardware acceleration slider to "None" in ControlPanel>Display>Settings>Advanced>TroubleShooting? What's the trick? (This works in Windows 2000 and later.)
  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    RE: [ntdev] VBEMP: Fast accelerated DrvBitBltI'm a bit rusty on this, but my experience is that Bitblt is usually fast unless you implement Rops in software. My experience is also that intra-board blits are way, way faster than memory-to-board or board-to-memory blits.

    I don't believe storing the framebuffer in system memory is a good idea either. The way to get performance out of video hardware is to achieve synergy with its design - you write your software for the card, not for the OS. And on the same bus and the same OS, every card is different.

    I would put a logic analyzer in that memory interface and see in detail what's really happening. I might also try to run the board on a VMetro and poke around the bus interface, again, at hw level. Alternatively, I would implement the blit as two texture-mapped triangles to see if I can get any insight on where performance is being lost. But still, I find hardware level debugging an invaluable help in this kind of situation!

    Hope this helps,


    Alberto.

    ----- Original Message -----
    From: Tim Green
    To: Windows System Software Devs Interest List
    Sent: Wednesday, September 26, 2007 1:07 AM
    Subject: RE: [ntdev] VBEMP: Fast accelerated DrvBitBlt




    > Or may be store framebuffer in *system*memory (instead of VRAM). and blit from there to *video*memory ?

    Yes, that will be much faster than reading the video card's RAM, especially for cards on buses designed to write data to the card much faster than read from it.

    Good luck!
    Tim Green
    http://www.displaylink.com/



    ---
    NTDEV is sponsored by OSR

    For our schedule of WDF, WDM, debugging and other seminars visit:
    http://www.osr.com/seminars

    To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
  • bearwindowsbearwindows Member Posts: 9
    Ok. In short - for example, we have a Pentium III PC with ATI Card:
    Read dword *video*memory is about ~ 8 Mbytes/sec
    Write dword *video*memory is about ~ 64 Mbytes/sec
    Read dword *system*memory is about ~ 400 Mbytes/sec
    Write dword *system*memory is about ~ 400 Mbytes/sec

    Feel the difference :)

    I must find a way to minimize read operations from *video*memory... Because it is 50 times SLOW
    than system's.

    I test my emulated driver on *system*memory. If is fast enough when I'm not using *video*memory at all.
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    Alberto Moreira wrote:
    > I'm a bit rusty on this, but my experience is that Bitblt is usually
    > fast unless you implement Rops in software. My experience is also that
    > intra-board blits are way, way faster than memory-to-board or
    > board-to-memory blits.
    >
    > I don't believe storing the framebuffer in system memory is a good
    > idea either. The way to get performance out of video hardware is to
    > achieve synergy with its design - you write your software for the
    > card, not for the OS. And on the same bus and the same OS, every card
    > is different.
    >
    > I would put a logic analyzer in that memory interface and see in
    > detail what's really happening. I might also try to run the board on a
    > VMetro and poke around the bus interface, again, at hw level.
    > Alternatively, I would implement the blit as two texture-mapped
    > triangles to see if I can get any insight on where performance is
    > being lost. But still, I find hardware level debugging an invaluable
    > help in this kind of situation!

    I think you have missed the point, Alberto. He's writing a generic
    driver, frame buffer only. He doesn't have access to the graphics chip
    -- all he has is a pointer to the pixels. The driver is supposed to
    work with every graphics chip that supports VBE.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • bearwindowsbearwindows Member Posts: 9
    >>I think you have missed the point, Alberto. He's writing a generic
    >>driver, frame buffer only. He doesn't have access to the graphics chip
    >>-- all he has is a pointer to the pixels. The driver is supposed to
    >>work with every graphics chip that supports VBE.
    That's right. My driver is VBE-only oriented by design. I don't want to use each card's specific acceleration functions. I want to make a really universal solution for every NT-based OS.
  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    Yes, I misunderstood the issue. Still, I would put a logic analyzer at the
    memory interface and try to see what's going on. I would also run it under a
    VMetro. Also, for my curiosity, why in this age of GPUs people still do
    frame buffer drivers ?

    Alberto.



    ----- Original Message -----
    From: <xxxxx@operamail.com>
    To: "Windows System Software Devs Interest List" <xxxxx@lists.osr.com>
    Sent: Thursday, September 27, 2007 1:41 PM
    Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt


    >>>I think you have missed the point, Alberto. He's writing a generic
    >>>driver, frame buffer only. He doesn't have access to the graphics chip
    >>>-- all he has is a pointer to the pixels. The driver is supposed to
    >>>work with every graphics chip that supports VBE.
    > That's right. My driver is VBE-only oriented by design. I don't want to
    > use each card's specific acceleration functions. I want to make a really
    > universal solution for every NT-based OS.
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
  • bearwindowsbearwindows Member Posts: 9
    To Alberto and others:
    First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
    This driver is good for:
    1) Legacy operating system (NT3/NT4) + *NEW* videocards.
    2) Reactos operating system support. (www.reactos.org)
    3) Contemprorary operating system (2K/XP/2K3) + *OLD* or unsupported videocards.
    4) Embedded systems like XP Embedded and Windows PE, BartPE
    5) Universal solution for office pc use (w/out Direct3D/Video overlay of course, but containing basic drawing functions)
    6) In ideal, my driver is trying to compete with this product: http://scitechsoft.com/products/ent/snap_main.html (it is ceased development from Novemver 2006 tiill now)
  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    > -----Original Message-----
    > From: xxxxx@lists.osr.com
    > [mailto:xxxxx@lists.osr.com] On Behalf Of
    > Alberto Moreira
    > Sent: 28 September 2007 02:10
    > To: Windows System Software Devs Interest List
    > Subject: Re: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt
    >
    > Also, for my curiosity, why in this age of GPUs people still
    > do frame buffer drivers ?

    Virtual GPUs. A straight forward frame buffer can be used with the heavy
    lifting performed by punting GDI calls back to EngXXX.

    In the XP Display Driver Model this is the system we use at DisplayLink,
    and it seems to work well for UltraVNC, MaxiVista and Microsoft's Remote
    Desktop application too.

    Tim Green
    Development Engineer
    DisplayLink (UK) Limited
    http://www.displaylink.com/
  • bearwindowsbearwindows Member Posts: 9
    > Starting with VBE 2.0, there is a VBE extension for doing bitblts
    > (AX=4F17). I don't know how many BIOSes implement that.
    > Tim Roberts, xxxxx@probo.com
    > Providenza & Boekelheide, Inc.

    Where I can get info about this function ???
    Please, help!!!
  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    Thanks! I didn't know that this was still alive after all these years.

    Alberto.


    ----- Original Message -----
    From: <xxxxx@operamail.com>
    To: "Windows System Software Devs Interest List" <xxxxx@lists.osr.com>
    Sent: Friday, September 28, 2007 12:52 AM
    Subject: RE:[ntdev] VBEMP: Fast accelerated DrvBitBlt


    > To Alberto and others:
    > First of all, read this: http://www.geocities.com/bearwindows/vbemp.htm
    > This driver is good for:
    > 1) Legacy operating system (NT3/NT4) + *NEW* videocards.
    > 2) Reactos operating system support. (www.reactos.org)
    > 3) Contemprorary operating system (2K/XP/2K3) + *OLD* or unsupported
    > videocards.
    > 4) Embedded systems like XP Embedded and Windows PE, BartPE
    > 5) Universal solution for office pc use (w/out Direct3D/Video overlay of
    > course, but containing basic drawing functions)
    > 6) In ideal, my driver is trying to compete with this product:
    > http://scitechsoft.com/products/ent/snap_main.html (it is ceased
    > development from Novemver 2006 tiill now)
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    xxxxx@operamail.com wrote:
    >> Starting with VBE 2.0, there is a VBE extension for doing bitblts
    >> (AX=4F17). I don't know how many BIOSes implement that.
    >>
    > Where I can get info about this function ???
    > Please, help!!!

    Google is a much more efficient search mechanism than this mailing
    list. After all, that's how I found that function in the first place.

    http://www.vesa.org/public/VBE/VBE-AF07.pdf

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • bearwindowsbearwindows Member Posts: 9
    To Tim Roberts
    I'll already have this document (VBE-AF07.pdf). In this document VESA fn's not defined. But where I can find 4F17 VESA fn detailed info?
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    xxxxx@operamail.com wrote:
    > To Tim Roberts
    > I'll already have this document (VBE-AF07.pdf). In this document VESA fn's not defined. But where I can find 4F17 VESA fn detailed info?
    >

    Well, now I can't find the reference again. I'll keep looking.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • bearwindowsbearwindows Member Posts: 9
    Hi there!
    I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer

    Such mechanism, I think, made by Microsoft in Windows 2000/XP/2003 in Win32k.sys when you set: Control Panel -> Display -> Settings tab -> Advanced button -> Troubleshooting tab and move Hardware acceleration slider to the left position (None). In this mode acceleration is acheived by VRAM buffering.

    First we have such speeds (for example):
    (I don't use any kind of videoadapter acceleration - only assembler reads)

    Read dword *video*memory is about ~ 8 Mbytes/sec
    Write dword *video*memory is about ~ 64 Mbytes/sec
    Read dword *system*memory is about ~ 400 Mbytes/sec
    Write dword *system*memory is about ~ 400 Mbytes/sec

    1) Normal operation - framebufer in *video*memory (without write combining / USWC)

    Read ~ 8 Mb/s
    Write ~64 Mb/s

    A-A-A!!!! Sloooooooooooooooooooooooow :(

    2) Then I place framebufer in *system*memory

    Read ~ 400 Mb/s
    Write ~400 Mb/s

    But that's not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)

    So:

    Read ~ 400 Mb/s
    Write ~400 Mb/s (real write)
    Write ~64 Mb/s (duplicate bitmap onto screen)

    It is much faster than 1). Good work.

    When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).

    Comments are welcome.
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,714
    xxxxx@operamail.com wrote:
    > I finally find a way to modify standard framebuf.dll device driver source (NT4) to realize buffering mechanism for video framebuffer
    > ...
    > First we have such speeds (for example):
    > (I don't use any kind of videoadapter acceleration - only assembler reads)
    >
    > Read dword *video*memory is about ~ 8 Mbytes/sec
    > Write dword *video*memory is about ~ 64 Mbytes/sec
    > Read dword *system*memory is about ~ 400 Mbytes/sec
    > Write dword *system*memory is about ~ 400 Mbytes/sec
    > ...
    > 2) Then I place framebufer in *system*memory
    > ...
    > But that's not ALL!!! I must duplicate Write operation by putting bitmap to *video*memory from *system*memory (i.e. cache)
    > ...
    > It is much faster than 1). Good work.
    >
    > When write combining / USWC enabled Write speed must be somewhat higher (about 2 times fast).
    >
    > Comments are welcome.
    >

    Yes, I believe this was the solution that was proposed to you a week or
    so ago. This same solution is used by XFree86/X.Org on Linux; they call
    it a "shadow buffer". It is the mechanism they use to rotate the screen
    90 degrees at a time.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!