CVE-2017-1000405-Huge_Dirtycow
前言
搭建调试环境
CONFIG_TRANSPARENT_HUGEPAGE
检查/sys/kernel/mm/transparent_hugepage/enabled 是否开启
透明大内存页(THP:Transparent Huge Pages)
通常linux的内存页为4kb大小,为了满足系统和程序的特殊需求,linux允许2M和1G大内存页。常规的虚拟地址翻译下图所示,PGD,PMD均作为页表目录,当启用大内存页时,PMD不再表示页表目录而是和PTE合并共同代表页表项(PTE)。
THP内存页主要被用于匿名anonymous,shmem,tmpfs三种内存映射,即:
(1)anonymous:通过mmap映射到内存是一个匿名文件及不对应任何实际磁盘文件。
(2)shmem:共享内存。
(3)tmpfs:是一种虚拟文件系统,存在于内存,因此访问速度会很快,当使用tmpfs类型挂载文件系统释放,它会自动创建
开启THG:
echo always > /sys/kernel/mm/transparent_hugepage/enabled
零页(zero page)
当程序申请匿名内存页时,linux系统为了节省时间和空间并不会真的申请一块物理内存,而是将所有申请统一映射到一块预先申请好值为零的物理内存页,当程序发生写入操作时,才会真正申请内存页,这一块预先申请好值为零的页即为零页(zero page),且零页是只读的。
漏洞分析
针对CVE-2016-5195漏洞,对THP的补丁如下:
@@ -783,6 +783,12 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
assert_spin_locked(pmd_lockptr(mm, pmd));
+ /*
+ * When we COW a devmap PMD entry, we split it into PTEs, so we should
+ * not be in this function with `flags & FOLL_COW` set.
+ */
+ WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");
+
if (flags & FOLL_WRITE && !pmd_write(*pmd))
return NULL;
@@ -1128,6 +1134,16 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
return ret;
}
+/*
+ * FOLL_FORCE can write to even unwritable pmd's, but only
+ * after we've gone through a COW cycle and they are dirty.
+ */
+static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+{
+ return pmd_write(pmd) ||
+ ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+}
+
struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
unsigned long addr,
pmd_t *pmd,
@@ -1138,7 +1154,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
assert_spin_locked(pmd_lockptr(mm, pmd));
- if (flags & FOLL_WRITE && !pmd_write(*pmd))
+ if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
goto out;
补丁链接:https://github.com/torvalds/linux/commit/8310d48b125d19fcd9521d83b8293e63eb1646aa
和普通的内存页修补相同。按照补丁以及源代码的流程分析,只要满足(flag& FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd) , 并且满足之前pte_present(pte)判断(即页面存在),就可以对页面进行写操作。
__get_user_pages(retry)
-> follow_page_mask
-> follow_page_pte
漏洞成因在于:
一般情况下,要页面标记为dirty是要经过COW过程,之后得到写权限操作的是COW页面,但获取可读THP内存页时,可以获得一个标记为dirty的页面,并且是未COW的,造成操作的是原页面。调用链为:
follow_page_mask -> page = follow_trans_huge_pmd(vma, address, pmd, flags);->touch_pmd->pmd_mkdirty
1119 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
1120 unsigned long addr,
1121 pmd_t *pmd,
1122 unsigned int flags)
1123 {
1124 struct mm_struct *mm = vma->vm_mm;
1125 struct page *page = NULL;
1126
1127 assert_spin_locked(pmd_lockptr(mm, pmd));
1128
1129 if (flags & FOLL_WRITE && !pmd_write(*pmd))
1130 goto out;
1131
1132 /* Avoid dumping huge zero page */
1133 if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
1134 return ERR_PTR(-EFAULT);
1135
1136 /* Full NUMA hinting faults to serialise migration in fault paths */
1137 if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
1138 goto out;
1139
1140 page = pmd_page(*pmd); //获得页面
1141 VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
1142 if (flags & FOLL_TOUCH)
1143 touch_pmd(vma, addr, pmd); //标志为dirty
……
747 static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
748 pmd_t *pmd)
749 {
750 pmd_t _pmd;
751
752 /*
753 * We should set the dirty bit only for FOLL_WRITE but for now
754 * the dirty bit in the pmd is meaningless. And if the dirty
755 * bit will become meaningful and we'll only set it with
756 * FOLL_WRITE, an atomic set_bit will be required on the pmd to
757 * set the young bit, instead of the current set_pmd_at.
758 */
759 _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); //标志为dirty
760 if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
761 pmd, _pmd, 1))
762 update_mmu_cache_pmd(vma, addr, pmd);
763 }
此时pte_pretend为1,不会再次触发缺页错误
,所以通过(flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))判断后,操作的是原页面。
漏洞利用
(1)调用follow_page_mask请求获取可写(FOLL_WRITE)THP内存页,发生缺页中断,返回值为NULL,调用faultin_page从磁盘中调入内存页,返回值为0。
(2)随着goto entry再次调用follow_page_mask,请求可写(FOLL_WRITE)内存页,由于内存页没有可写权限,返回值为NULL,调用fault_page复制只读内存页获得FOLL_COW标志,返回值为0。
前面两步和Dirtycow相同,为了获取FOLL_COW标志
,而DirtyCow的前两步是为了将flags中的FOLL_WRITE位置0。
(3)随着goto entry 再次调用follow由于cond_resched会主动放权,引起系统调度其他程序,另一个程序B使用madvise(MADV_DONTNEED)换出内存页,同时程序B读内存页,那么则会最终调用touch_pmd,将内存页标记为脏的。
(4)程序再次被调度执行,调用follow_page_mask请求获取可写(FOLL_WRITE)内存页,此时满足FOLL_COW和脏的,因此程序获得可写内存页。
(5)后续进行写入操作,只要设置合理THP内存页可以写前面提到的零页(zero pages),其他共享零页的进程读取修改后的零页数据进行相关操作就会发生crash,触发漏洞。
补丁:
只有在有写请求的情况下,才将页面标为dirty,例如对于申请只读的零物理内存页时,FOLL_WRITE就不满足,阻止了漏洞利用。
参考链接
poc 地址:https://github.com/bindecy/HugeDirtyCowPOC
分析文章:
https://medium.com/bindecy/huge-dirty-cow-cve-2017-1000405-110eca132de0
https://www.freebuf.com/column/203162.html
https://www.anquanke.com/post/id/89096
补丁链接: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=7031ae2ab37d3df53c4a4e9903329a5d38c745ec
补丁commit:a8f97366452ed491d13cf1e44241bc0b5740b1f0
poc 地址: https://raw.githubusercontent.com/bindecy/HugeDirtyCowPOC/master/main.c