[SPAM]: RE: Nasty new Intel processor bug

More details of this issue are now at https://meltdownattack.com/ There are two different bugs.

It looks like the first bug, known as Meltdown, will require a fix along the lines of keeping separate user mode and kernel mode page tables, and changing the page table base register, CR3, anytime the processor shifts between user and kernel mode, like on every system call and every interrupt. Changing CR3 flushes some cached processor state (like TLBs), and I’m reading the overhead may be in the hundreds of clock cycles. The performance impact is very workload dependent, but for network I/O intensive systems, the performance impact could be as much as 30%.

The basic problem is when the processor does speculative execution down a not taken branch path, it fails to properly apply memory protection. The reports are by using this, it’s possible to indirectly determine the contents of kernel memory from a user process. As all physical memory is mapped into kernel addresses, all memory in all processes is potentially readable to the attacking process. The attack can also cross docker and at least some hypervisor boundaries.

A second related bug, impacts Intel, AMD, and some ARM processors.

It sounds like Amazon and Azure clouds are implementing fixes, as they announced required VM reboots in the coming week, I’m guessing to patch the hypervisors.

From what I read, the kernel mode page table should still be able to have the user space mapped, so it’s not immediately clear there will be any impact (other than performance) on METHOD_NEITHER buffer access. It does seem possible drivers that map kernel memory into a user process addresses may be impacted, although we will have to see what mitigation Microsoft ends up doing.

If arbitrary processes can cause kernel address read accesses, there is the potential for a user process causing accesses to device mapped memory. Some devices will malfunction if their registers are arbitrarily read, causing a potential DoS vulnerability and/or data corruption.

Jan

xxxxx@pmatrix.com wrote:

More details of this issue are now at https://meltdownattack.com/
https:</https:> There are two different bugs.

It’s really hard for me to call this a bug.  These people didn’t go
“read” kernel memory, in any traditional sense.  They used detailed
knowledge of the architecture and cache behavior, and some very clever
statistical analysis of the resulting timing, to intuit the value of
individual bits in otherwise inaccessible memory.  This strikes me as
the same kind of attack as the people who “cracked” RSA by measuring the
CPU’s voltage consumption during the encoding process and intuiting the
plaintext by the path through the code.

Is it a hole?  Yes.  Is it a hole that anyone could possibly have
anticipated beforehand?  No way.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> Changing CR3 flushes some cached processor state (like TLBs),

IIRC, this does not apply to the pages that are marked as ‘Global’ ones in their PTEs - after all, this is the very purpose of a ‘Global’ bit.

It looks like the first bug, known as Meltdown, will require a fix along the lines of keeping >separate user mode and kernel mode page tables, and changing the page table base register, >CR3, anytime the processor shifts between user and kernel mode, like on every system
call and every interrupt.

Well, in order to make it work they will have to mark all kernel pages as non-global ones, plus, as you have mentioned already, keep separate kernel and user tables.

On one hand, performance penalty seems to be just dramatic if they go this way, at least at tye first glance. However, if you think of a kernel just as of a special process with its own page directory and page tables, then you arrive to the solution that is more or less reminiscent of,say, Linux running on top of L4, i.e. something that is known to show an acceptable performance

A second related bug, impacts Intel, AMD, and some ARM processors.

This is what all the media keeps on reiterating again and again and again, but I am somehow surprised by this combination. As long as we are speaking about Intel and AMD, we can make an assumption that the bug is inherent to the very x86_64 architecture( which means this is not really a bug but just an architectural flaw,but anyway). However, when I see ARM in this “company”, I am getting pretty confused. After all, ARM is a totally different architecture. How come that the same hardware vulnerability is sort of “portable” across different architectures???

Anton Bassov