- iTLB multihit¶
- Affected processors¶
- Related CVEs¶
- Problem¶
- Attack scenarios¶
- iTLB multihit system information¶
- Enumeration of the erratum¶
- Mitigation mechanism¶
- Mitigation control on the kernel command line and KVM — module parameter¶
- Mitigation selection guide¶
- 1. No virtualization in use¶
- 2. Virtualization with trusted guests¶
- 3. Virtualization with untrusted guests¶
- How to disable mitigations for CPU vulnerabilities
- 5. Настройка параметров ядра#
- 5.1. Обновление загрузчика и отключение ненужных заплаток#
- 5.1.1. Разъяснения#
iTLB multihit¶
iTLB multihit is an erratum where some processors may incur a machine check error, possibly resulting in an unrecoverable CPU lockup, when an instruction fetch hits multiple entries in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. A malicious guest running on a virtualized system can exploit this erratum to perform a denial of service attack.
Affected processors¶
Variations of this erratum are present on most Intel Core and Xeon processor models. The erratum is not present on:
- non-Intel processors
- Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont)
- Intel processors that have the PSCHANGE_MC_NO bit set in the IA32_ARCH_CAPABILITIES MSR.
Related CVEs¶
The following CVE entry is related to this issue:
CVE-2018-12207 | Machine Check Error Avoidance on Page Size Change |
Problem¶
Privileged software, including OS and virtual machine managers (VMM), are in charge of memory management. A key component in memory management is the control of the page tables. Modern processors use virtual memory, a technique that creates the illusion of a very large memory for processors. This virtual space is split into pages of a given size. Page tables translate virtual addresses to physical addresses.
To reduce latency when performing a virtual to physical address translation, processors include a structure, called TLB, that caches recent translations. There are separate TLBs for instruction (iTLB) and data (dTLB).
Under this errata, instructions are fetched from a linear address translated using a 4 KB translation cached in the iTLB. Privileged software modifies the paging structure so that the same linear address using large page size (2 MB, 4 MB, 1 GB) with a different physical address or memory type. After the page structure modification but before the software invalidates any iTLB entries for the linear address, a code fetch that happens on the same linear address may cause a machine-check error which can result in a system hang or shutdown.
Attack scenarios¶
Attacks against the iTLB multihit erratum can be mounted from malicious guests in a virtualized system.
iTLB multihit system information¶
The Linux kernel provides a sysfs interface to enumerate the current iTLB multihit status of the system:whether the system is vulnerable and which mitigations are active. The relevant sysfs file is:
The possible values in this file are:
The processor is not vulnerable.
KVM: Mitigation: Split huge pages
Software changes mitigate this issue.
KVM: Mitigation: VMX unsupported
KVM is not vulnerable because Virtual Machine Extensions (VMX) is not supported.
KVM: Mitigation: VMX disabled
KVM is not vulnerable because Virtual Machine Extensions (VMX) is disabled.
The processor is vulnerable, but no mitigation enabled
Enumeration of the erratum¶
A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr and will be set on CPU’s which are mitigated against this issue.
IA32_ARCH_CAPABILITIES MSR | Not present | Possibly vulnerable,check model |
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] | ‘0’ | Likely vulnerable,check model |
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] | ‘1’ | Not vulnerable |
Mitigation mechanism¶
This erratum can be mitigated by restricting the use of large page sizes to non-executable pages. This forces all iTLB entries to be 4K, and removes the possibility of multiple hits.
In order to mitigate the vulnerability, KVM initially marks all huge pages as non-executable. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable.
If EPT is disabled or not available on the host, KVM is in control of TLB flushes and the problematic situation cannot happen. However, the shadow EPT paging mechanism used by nested virtualization is vulnerable, because the nested guest can trigger multiple iTLB hits by modifying its own (non-nested) page tables. For simplicity, KVM will make large pages non-executable in all shadow paging modes.
Mitigation control on the kernel command line and KVM — module parameter¶
The KVM hypervisor mitigation mechanism for marking huge pages as non-executable can be controlled with a module parameter «nx_huge_pages=». The kernel command line allows to control the iTLB multihit mitigations at boot time with the option «kvm.nx_huge_pages=».
The valid arguments for these options are:
force | Mitigation is enabled. In this case, the mitigation implements non-executable huge pages in Linux kernel KVM module. All huge pages in the EPT are marked as non-executable. If a guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. |
off | Mitigation is disabled. |
auto | Enable mitigation only if the platform is affected and the kernel was not booted with the «mitigations=off» command line parameter. This is the default option. |
Mitigation selection guide¶
1. No virtualization in use¶
The system is protected by the kernel unconditionally and no further action is required.
2. Virtualization with trusted guests¶
If the guest comes from a trusted source, you may assume that the guest will not attempt to maliciously exploit these errata and no further action is required.
3. Virtualization with untrusted guests¶
If the guest comes from an untrusted source, the guest host kernel will need to apply iTLB multihit mitigation via the kernel command line or kvm module parameter.
How to disable mitigations for CPU vulnerabilities
Inspect kernel parameters for detailed information.
mitigations= [X86,PPC,S390,ARM64] Control optional mitigations for CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options. off Disable all optional CPU mitigations. This improves system performance, but it may also expose users to several CPU vulnerabilities. Equivalent to: nopti [X86,PPC] kpti=0 [ARM64] nospectre_v1 [X86,PPC] nobp=0 [S390] nospectre_v2 [X86,PPC,S390,ARM64] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] ssbd=force-off [ARM64] l1tf=off [X86] mds=off [X86] tsx_async_abort=off [X86] kvm.nx_huge_pages=off [X86] Exceptions: This does not have any effect on kvm.nx_huge_pages when kvm.nx_huge_pages=force. auto (default) Mitigate all CPU vulnerabilities, but leave SMT enabled, even if it's vulnerable. This is for users who don't want to be surprised by SMT getting disabled across kernel upgrades, or who have other ways of avoiding SMT-based attacks. Equivalent to: (default behavior) auto,nosmt Mitigate all CPU vulnerabilities, disabling SMT if needed. This is for users who always want to be fully mitigated, even if it means losing SMT. Equivalent to: l1tf=flush,nosmt [X86] mds=full,nosmt [X86] tsx_async_abort=full,nosmt [X86]
Update GRUB configuration.
Sourcing file `/etc/default/grub' Sourcing file `/etc/default/grub.d/init-select.cfg' Generating grub configuration file . Found linux image: /boot/vmlinuz-5.4.0-14-generic Found initrd image: /boot/initrd.img-5.4.0-14-generic Found linux image: /boot/vmlinuz-5.4.0-9-generic Found initrd image: /boot/initrd.img-5.4.0-9-generic Found memtest86+ image: /memtest86+.elf Found memtest86+ image: /memtest86+.bin done
Inspect applied mitigations.
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz Stepping: 3 CPU MHz: 1200.788 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 5786.81 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-3 Vulnerability Itlb multihit: KVM: Vulnerable Vulnerability L1tf: Mitigation; PTE Inversion; VMX vulnerable, SMT disabled Vulnerability Mds: Vulnerable; SMT disabled Vulnerability Meltdown: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse s se2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopolog y nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm a bm cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsg sbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Follow me on Mastodon , check out source code ad GitHub
5. Настройка параметров ядра#
5.1. Обновление загрузчика и отключение ненужных заплаток#
По умолчанию в ядре Linux включено довольно много исправлений безопасности, которые однако существенно снижают производительность процессора. Вы можете их отключить через редактирование параметров загрузчика. Рассмотрим на примере GRUB:
sudo nano /etc/default/grub # Редактируем настройки вручную или через grub-customizer как на изображении:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash rootfstype=btrfs lpj=3499912 raid=noautodetect mitigations=off preempt=none nowatchdog audit=0 page_alloc.shuffle=1 split_lock_detect=off pci=pcie_bus_perf"
sudo grub-mkconfig -o /boot/grub/grub.cfg # Обновляем загрузчик, можно так же сделать через grub-customizer, добавить и прожать, затем сохранить на 2 и 1 вкладке.
5.1.1. Разъяснения#
lpj= — Уникальный параметр для каждой системы. Его значение автоматически определяется во время загрузки, что довольно трудоемко, поэтому лучше задать вручную. Определить ваше значение для lpj можно через следующую команду: sudo dmesg | grep «lpj=»
mitigations=off — Непосредственно отключает все заплатки безопасности ядра (включая Spectre и Meltdown). Подробнее об этом написано здесь.
raid=noautodetect — Отключает проверку на RAID во время загрузки. Если вы его используете — НЕ прописывайте данный параметр.
rootfstype=btrfs — Здесь указываем название файловой системы в которой у вас отформатирован корень.
nowatchdog — Отключает сторожевые таймеры. Позволяет избавиться от заиканий в онлайн играх.
page_alloc.shuffle=1 — Этот параметр рандомизирует свободные списки распределителя страниц. Улучшает производительность при работе с ОЗУ с очень быстрыми накопителями (NVMe, Optane). Подробнее тут.
split_lock_detect=off — Отключаем раздельные блокировки шины памяти. Одна инструкция с раздельной блокировкой может занимать шину памяти в течение примерно 1 000 тактов, что может приводить к кратковременным зависаниям системы.
pci=pcie_bus_perf — Увеличивает значение Max Payload Size (MPS) для родительской шины PCI Express. Это даёт лучшую пропускную способность, т. к. некоторые устройства могут использовать значение MPS/MRRS выше родительской шины. Больше подробностей здесь (англ.):