



2

1

3

THE UNIVERSITY
of NORTH CAROLINA
at CHAPEL HILL **COMP 790: OS Implementation** Lecture Goal · Understand the hardware tools available on a modern x86 processor for manipulating and protecting memory · Lab 2: You will program this hardware Apologies: Material can be a bit dry, but important - Plus, slides will be good reference · But, cool tech tricks: - How does thread-local storage (TLS) work? - An actual (and tough) Microsoft interview question

COMP 790: OS Implementation **Undergrad Review** • What is: - Virtual memory? - Segmentation? - Paging?



COMP 790: OS Implementation **Two System Goals** 1) Provide an abstraction of contiguous, isolated virtual memory to a program 2) Prevent illegal operations - Prevent access to other application or OS memory - Detect failures early (e.g., segfault on address 0) - More recently, prevent exploits that try to execute program data

6



\*\*Real mode – walks and talks like a really old x86 chip

- State at boot

- 20-bit address space, direct physical memory access

• 1 MB of usable memory

- Segmentation available (no paging)

• Protected mode – Standard 32-bit x86 mode

- Segmentation and paging

- Privilege levels (separate user and kernel)

8

10

7



COMP 790: OS Implementation

Translation Overview

Oxdeadbeef Segmentation

Oxdeadbeef Virtual Address

Protected/Long mode only

Segmentation cannot be disabled!

But can be a no-op (aka flat mode)

9



Programming model

Segments for: code, data, stack, "extra"

A program can have up to 6 total segments
Segments identified by registers: cs, ds, ss, es, fs, gs

Prefix all memory accesses with desired segment:

mov eax, ds:0x80 (load offset 0x80 from data into eax)
jmp cs:0xab8 (jump execution to code offset 0xab8)
mov ss:0x40, ecx (move ecx to stack offset 0x40)



Programming, cont.

• This is cumbersome, so infer code, data and stack segments by instruction type:

— Control-flow instructions use code segment (jump, call)

— Stack management (push/pop) uses stack

— Most loads/stores use data segment

• Extra segments (es, fs, gs) must be used explicitly

13

Segment management

For safety (without paging), only the OS should define segments. Why?

Two segment tables the OS creates in memory:
Global – any process can use these segments
Local – segment definitions for a specific process
How does the hardware know where they are?
Dedicated registers: gdtr and ldtr
Privileged instructions: lgdt, lldt

COMP 790: OS Implementation

Segment registers

Table Index (13 bits) Global or Local Table? (1 bit) Ring (2 bits)

• Set by the OS on fork, context switch, etc.

15



Sample Problem:
(Old) JOS Bootloader

Suppose my kernel is compiled to be in upper 256
MB of a 32-bit address space (i.e., 0xf0100000)

Common to put OS kernel at top of address space

Bootloader starts in real mode (only 1MB of addressable physical memory)

Bootloader loads kernel at 0x00100000

Can't address 0xf0100000

18

14

16



COMP 790: OS Implementation

Segmentation to the Rescue!

• kern/entry.S:

— What is this code doing?

mygdt:

SEG\_NULL

SEG\_STA\_X | STA\_R, -KERNBASE, 0xfffffffff) # code seg
SEG(STA\_W, -KERNBASE, 0xffffffff) # data seg

20

22

19



### COMP 790: OS Implementation

Flat segmentation

The above trick is used for booting. We eventually want to use paging.

How can we make segmentation a no-op?

From kern/pmap.c:

// 0x8 - kernel code segment

[GD\_KT >> 3] = SEG (STA\_X | STA\_R, 0x0, 0xffffffff, 0),

Execute and Read permission

Offset (0x0000000)

Ring 0

<u>'1</u>



Paging Model

• 32 (or 64) bit address space.

• Arbitrary mapping of linear to physical pages

• Pages are most commonly 4 KB

– Newer processors also support page sizes of 2 MB and 1 GB

24











Page flags

• 3 for OS to use however it likes

• 4 reserved by Intel, just in case

• 3 for OS to CPU metadata

– User/vs kernel page,

– Write permission,

– Present bit (so we can swap out pages)

• 2 for CPU to OS metadata

– Dirty (page was written), Accessed (page was read)







COMP 790: OS Implementation **TLB Entries** · The CPU caches address translations in the TLB - Translation Lookaside Buffer cr3 0xf0231000 0x1000 0x00b31000 0x1f000 0xb0002000 0xc1000 Page Traversal is Slow Table Lookup is Fast

34

33





35 36



Physical Address Extension (PAE)

Period with 32-bit machines + >4GB RAM (2000's)

Essentially, an early deployment of a 64-bit page table format

Any given process can only address 4GB

Including OS!

Page tables themselves can address >4GB of physical pages

37 38



COMP 790: OS Implementation

Nested page tables

Paging tough for early Virtual Machine implementations

- Can't trust a guest OS to correctly modify pages

So, add another layer of paging between host-physical and guest-physical

40

39







TLS implementation

TLS implementation

• Map a few pages per thread into a segment

• Use an "extra" segmentation register

- Usually gs

- Windows TEB in fs

• Any thread accesses first byte of TLS like this:

mov eax, gs:(0x0)



COMP 790: OS Implementation

Viva segmentation!

• My undergrad OS course treated segmentation as a historical artifact

- Yet still widely (ab)used

- Also used for sandboxing in vx32, Native Client

- Used to implement early versions of VMware

• Counterpoint: TLS hack is just compensating for lack of general-purpose registers

• Either way, all but fs and gs are deprecated in x64



\*\*COMP 790: OS Implementation

\*\*Solution sketch\*\*

\* A 4MB address space will only use the low 22 bits of the address space.

- So the first level translation will always hit entry 0

\* Map the page table's physical address at entry 0

- First translation will "loop" back to the page table

- Then use page table normally for 4MB space

\* Assumes correct programs will not read address 0

- Getting null pointers early is nice

- Challenge: Refine the solution to still get null pointer exceptions

