Windows memory performance as a QEMU guest VM.

Hi everyone. I am seeking someone that is able to assist or at least advise on what might be the issue with memory performance when running Windows 10 as a KVM guest under QEMU.

I have an AMD ThreadRipper platform running Linux, with both Windows and Linux VMs on it running under Qemu with KVM acceleration. Linux guests are able to achieve near native memory performance, at roughly 36GB/s at any copy size. However Windows guests are only able to achieve native performance until the memcpy block size exceeds exactly 1,054,735 bytes, at which point the performance drops to 12-14GB/s.

Using various memcpy implementations, (glib, vcruntime, custom, apex) no matter what I do I can not improve upon 14GB/s.

I have been pouring through the QEMU source for days now working with a few other people and we are unable to determine why Windows copy performance is so abysmal.

I just wonder if it may be somehow related to the fact that, unlike Linux, Windows allows pageable memory in the kernel…

Anton Bassov

What connection can you see between pageable kernel memory and memory copy performance under a hypervisor for large block sizes?

Sent from Mailhttps: for Windows 10

________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@hotmail.com
Sent: Monday, May 14, 2018 5:02:09 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Windows memory performance as a QEMU guest VM.

I just wonder if it may be somehow related to the fact that, unlike Linux, Windows allows pageable memory in the kernel…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:>

> What connection can you see between pageable kernel memory and memory copy

performance under a hypervisor for large block sizes?

Once Linux kernel does not use pageable kernel memory it can afford a really simple memory mapping mechanism - it maps the physical memory into the kernel address space as
virtual_kernel_address =base_ offset + physical_address (I assume a 64-bit OS with the kernel address space being significantly larger than the size of RAM available). This approach requires memory to be physically contiguous. Therefore, if you copy a range that exceeds PAGE_SIZE
it is going to be physically contiguous, no matter how large it may be.

Although it has an area where the above equations does not hold because mapping is done on per-page basis (i.e. vmalloc() area), every valid virtual address in this range is still backed up by a page that is mapped to kmalloc() one.

Taking into consideration the above, I just wonder if hypervisor may do some memory optimisations for a known LInux guest behind the scenes…

Anton Bassov

Hi Geoffrey,

Can you please post your QEMU command line and the benchmark code you are using?
Also posting the issue on qemu-devel might improve you chance to get more help.

Best regards,
Yan.

It seems unlikely that this kind of effect could reduce the performance of memory copies in windows guests when blocks exceed ~1MB

Sent from Mailhttps: for Windows 10

________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@hotmail.com
Sent: Tuesday, May 15, 2018 12:02:46 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Windows memory performance as a QEMU guest VM.

> What connection can you see between pageable kernel memory and memory copy
> performance under a hypervisor for large block sizes?

Once Linux kernel does not use pageable kernel memory it can afford a really simple memory mapping mechanism - it maps the physical memory into the kernel address space as
virtual_kernel_address =base_ offset + physical_address (I assume a 64-bit OS with the kernel address space being significantly larger than the size of RAM available). This approach requires memory to be physically contiguous. Therefore, if you copy a range that exceeds PAGE_SIZE
it is going to be physically contiguous, no matter how large it may be.

Although it has an area where the above equations does not hold because mapping is done on per-page basis (i.e. vmalloc() area), every valid virtual address in this range is still backed up by a page that is mapped to kmalloc() one.

Taking into consideration the above, I just wonder if hypervisor may do some memory optimisations for a known LInux guest behind the scenes…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:>