Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting... Please check out the Community Guidelines in the
Announcements and Administration Category.

Windows memory performance as a QEMU guest VM.

Hi everyone. I am seeking someone that is able to assist or at least advise on what might be the issue with memory performance when running Windows 10 as a KVM guest under QEMU.

I have an AMD ThreadRipper platform running Linux, with both Windows and Linux VMs on it running under Qemu with KVM acceleration. Linux guests are able to achieve near native memory performance, at roughly 36GB/s at any copy size. However Windows guests are only able to achieve native performance until the memcpy block size exceeds exactly 1,054,735 bytes, at which point the performance drops to 12-14GB/s.

Using various memcpy implementations, (glib, vcruntime, custom, apex) no matter what I do I can not improve upon 14GB/s.

I have been pouring through the QEMU source for days now working with a few other people and we are unable to determine why Windows copy performance is so abysmal.

Comments

  • anton_bassovanton_bassov Posts: 4,800
    I just wonder if it may be somehow related to the fact that, unlike Linux, Windows allows pageable memory in the kernel....


    Anton Bassov
  • MBondMBond Posts: 843
    What connection can you see between pageable kernel memory and memory copy performance under a hypervisor for large block sizes?



    Sent from Mail for Windows 10



    ________________________________
    From: xxxxx@lists.osr.com on behalf of xxxxx@hotmail.com
    Sent: Monday, May 14, 2018 5:02:09 AM
    To: Windows System Software Devs Interest List
    Subject: RE:[ntdev] Windows memory performance as a QEMU guest VM.

    I just wonder if it may be somehow related to the fact that, unlike Linux, Windows allows pageable memory in the kernel....


    Anton Bassov

    ---
    NTDEV is sponsored by OSR

    Visit the list online at:

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at

    To unsubscribe, visit the List Server section of OSR Online at
  • anton_bassovanton_bassov Posts: 4,800
    > What connection can you see between pageable kernel memory and memory copy
    > performance under a hypervisor for large block sizes?


    Once Linux kernel does not use pageable kernel memory it can afford a really simple memory mapping mechanism - it maps the physical memory into the kernel address space as
    virtual_kernel_address =base_ offset + physical_address (I assume a 64-bit OS with the kernel address space being significantly larger than the size of RAM available). This approach requires memory to be physically contiguous. Therefore, if you copy a range that exceeds PAGE_SIZE
    it is going to be physically contiguous, no matter how large it may be.

    Although it has an area where the above equations does not hold because mapping is done on per-page basis (i.e. vmalloc() area), every valid virtual address in this range is still backed up by a page that is mapped to kmalloc() one.


    Taking into consideration the above, I just wonder if hypervisor may do some memory optimisations for a known LInux guest behind the scenes......



    Anton Bassov
  • Hi Geoffrey,

    Can you please post your QEMU command line and the benchmark code you are using?
    Also posting the issue on qemu-devel might improve you chance to get more help.

    Best regards,
    Yan.
  • MBondMBond Posts: 843
    It seems unlikely that this kind of effect could reduce the performance of memory copies in windows guests when blocks exceed ~1MB



    Sent from Mail for Windows 10



    ________________________________
    From: xxxxx@lists.osr.com on behalf of xxxxx@hotmail.com
    Sent: Tuesday, May 15, 2018 12:02:46 AM
    To: Windows System Software Devs Interest List
    Subject: RE:[ntdev] Windows memory performance as a QEMU guest VM.

    > What connection can you see between pageable kernel memory and memory copy
    > performance under a hypervisor for large block sizes?


    Once Linux kernel does not use pageable kernel memory it can afford a really simple memory mapping mechanism - it maps the physical memory into the kernel address space as
    virtual_kernel_address =base_ offset + physical_address (I assume a 64-bit OS with the kernel address space being significantly larger than the size of RAM available). This approach requires memory to be physically contiguous. Therefore, if you copy a range that exceeds PAGE_SIZE
    it is going to be physically contiguous, no matter how large it may be.

    Although it has an area where the above equations does not hold because mapping is done on per-page basis (i.e. vmalloc() area), every valid virtual address in this range is still backed up by a page that is mapped to kmalloc() one.


    Taking into consideration the above, I just wonder if hypervisor may do some memory optimisations for a known LInux guest behind the scenes......



    Anton Bassov

    ---
    NTDEV is sponsored by OSR

    Visit the list online at:

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at

    To unsubscribe, visit the List Server section of OSR Online at
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!