Re: [ntdev] Instructions and Clock sycles

Also remember about cache effects. A core can sit idle for hundreds of cycles waiting for a read from remote memory in a NUMA system regardless of how many mu-ops an instruction might require.

Although much maligned, hyper-threading does truly improve overall performance when software with bad memory layout (the default when using OO languages) execute

Sent from Surface Pro

From: Maxim S. Shatskih
Sent: ‎Monday‎, ‎October‎ ‎20‎, ‎2014 ‎7‎:‎00‎ ‎PM
To: Windows System Software Devs Interest List

However, consider this. If I have an instruction that needs to read
from memory, first I need to figure out what address I need to read.
That might involve pulling values from two registers, adding them, and
adding a constant from the instruction.

…and this is done by a separate set of silicon gates, parallel to the main execution flow, since 80286.

Division is another interesting example. There is still no algorithm
for doing a division in one cycle.

Yes, though some CPUs (IIRC DEC Alpha, though I can be wrong, also possible the modern x86 CPUs too) have the “flash multiplier” which can do MUL in 1 cycle.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer