The XamOS — Let’s Build an OS!!!
The 7th step of OS development — “virtual_memory_paging”
Welcome back, Readers!!!
If you’re new to OS development and my article series you better come from this first article.Otherwise it would be hard to understand what’s going on.🤔
Today’s flow:
1️⃣▶️ A Short Introduction to Virtual Memory
▶️ How virtual memory acts
▶️ How the operating system sees memory
▶️ Virtual Memory Through Segmentation?
2️⃣▶️ Paging
▶️ Why Paging?
▶️ Paging in x86
▶️ Paging and the Kernel
▶️ Virtual Memory Through Paging
01. Short Introduction to Virtual Memory
The concept of virtual memory is to create logical regions of memory locations and back them up with real, physical memory using a set of dynamically maintained mapping tables. The following are some of the reasons why this is done:
Each process will have its own logical memory area.
Processes cannot interfere with each other.
Sparse memory layout is handled efficiently.
Programs and data can be shared.
Efficiency can be gained by creating a memory hierarchy.
How virtual memory acts
Each application wants the memory to be seen as a series of sequentially numbered bytes, free of allocations to other programs or the operating system. Both data and instructions are stored in this memory. It could be a fixed set of locations for global variables, available throughout the computation; it could be dynamic, either temporary locations available only for the run of a function or allocated and referenced by using a pointer, carefully passed around to interested parties who need to refer to the data.
Each application wants the memory to be seen as a series of sequentially numbered bytes, free of allocations to other programs or the operating system. Both data and instructions are stored in this memory. It could be a fixed set of locations for global variables, available throughout the computation; it could be dynamic, either temporary locations available only for the run of a function or allocated and referenced by using a pointer, carefully passed around to interested parties who need to refer to the data. The MMU is a piece of hardware that intervenes and transforms that address to a physical address in actual memory.
Virtual Memory VS Physical Memory
How the operating system sees memory
The linker-loader is responsible for laying out the program and its data into memory. The following is a common layout: blank space, the program in a read-only text section, static and pre-initialized data. Above this is the heap, which contains dynamically allocated data and is accessed by the C program through malloc. To build data structures (objects) in the heap, the C++ application can utilize new. The heap isn’t fixed in size; it expands to the limit of memory.
The top of memory is not physical memory exhaustion, but virtual memory fatigue. You never run out of material with virtual memory since it is virtual. You’ve used up all of your addresses.4 billion bytes of memory may be addressed by a 32-bit machine. You’re no longer utilizing “memory” in the sense of this section after that. It might be a file in a file system, which could be larger than 4G. It’s possible that it’s a network stream. However, no matter how good your physical memory is, you won’t be able to remember it. Because virtual memory has reached its limit.
The heap can’t travel all the way to the top of memory since the stack is already there and expanding down. When all of the memory locations in between are used up by the combined demands on the heap and stack, virtual memory is depleted. The stack is where the majority of the data is kept. All local variables are on the stack; basically, anything you don’t malloc and hasn’t made global (the file scope) is there.
There’s another reason why the heap isn’t at the top of the memory stack. Some operating systems limit the amount of virtual memory available to user programs and position the operating system at the top of the heap. This is something Linux can accomplish. The user is only granted 3G of virtual space. The kernel is located between 3G and 4G virtual. This is done to ensure that the kernel is included in all user programs. The top 1G is disallowed for reading or write when the user program is running, but the entire 4G is available when the kernel is running, allowing the kernel to read and write both its own data and that of the user program. This isn’t something that all operating systems do. The operating system and user apps have their own areas in OSX.
Virtual memory can be implemented in the x86 architecture in two ways:
01. Segmentation
02. Paging
The most popular and adaptable approach is paging, and we’ll go through how to use it in this post. To allow code to operate under multiple permission levels, some segmentation is still required.
A large part of what an operating system does is manage memory. This is dealt with via paging and page frame allocation. The second and third articles cover segmentation and paging.
Virtual Memory Through Segmentation?
You may completely avoid paging and depend solely on virtual memory segmentation. Each user-mode process would have its own segment, with a correctly configured base address and limit. No process can view another process’ memory in this way. The physical memory for a process must be contiguous, which presents a difficulty (or at least it is very convenient if it is). Either we need to know how much memory the application will use ahead of time (unlikely), or we can relocate memory segments to areas where they can expand when the limit is reached (expensive, causes fragmentation — can result in “out of memory” even if adequate memory is available). Both of these issues are alleviated by using paging. It’s worth noting that segmentation is nearly entirely absent in x86 64 (the 64-bit variant of the x86 architecture).
02. Paging
Segmentation is the process of converting a logical address into a linear address. Paging converts these linear addresses to physical address space, as well as determining access permissions and how memory should be cached.
Why Paging?
Paging is the most popular method for enabling virtual memory in x86 processors. Virtual memory is achieved by paging, which gives each process the appearance that the accessible memory range is 0x00000000–0xFFFFFFFF, despite the fact that the real memory capacity is significantly smaller. It also indicates that a process will use a virtual (linear) address instead of a physical address when accessing a byte of memory. The code in the user process will be unaffected (except for execution delays). The MMU and the page table convert the linear address to a physical address. The CPU will issue a page fault interrupt if the virtual address isn’t mapped to a physical address.
Paging is an optional feature that certain operating systems do not provide. However, paging is the best technique to make particular portions of memory available exclusively to code executing at a specific privilege level (so that we may have processes operating at various privilege levels).
Paging in x86
In x86, paging consists of a page directory (PDT) that can include references to up to 1024 page tables (PT), each of which can correspond to up to 1024 page frames in physical memory (PF). Each page frame has a size of 4096 bytes. The first ten bits of a virtual (linear) address specify the offset of a page directory entry (PDE) in the current PDT, and the following ten bits specify the offset of a page table entry (PTE) inside the page table referred to by that PDE. The offset within the page frame to be addressed is represented by the lowest 12 bits of the address.
All page directories, page tables, and page frames must have 4096-byte addresses. Because the lowest 12 bits of a 32-bit address must be zero, the upper 20 bits of a 32-bit address can be used to address a PDT, PT, or PF. The format of the PDE and PTE is quite similar: 32 bits (4 bytes), with the upper 20 bits indicating a PTE or PF and the lowest 12 bits controlling access privileges and other settings. A page directory and a page table both fit in a page frame because 4 bytes times 1024 = 4096 bytes. The graphic below depicts the conversion of linear addresses to physical addresses.
While most pages are 4096 bytes in size, 4 MB pages are also available. After that, a PDE points to a 4 MB page frame that must be aligned on a 4 MB address boundary. The address translation is nearly identical to that shown in the image, with the exception of the page table step. Pages of 4 MB and 4 KB can be mixed.
The register cr3 stores the 20 bits referring to the current PDT. For configuration, the bottom 12 bits of cr3 are utilized. The most interesting parts are U/S, which determines whether privilege levels (PL0 or PL3) can access this page, and R/W, which determines whether the page’s memory is read-write or read-only.
Identity Paging
Identity paging is the most basic type of paging, in which each virtual address is mapped to an identical physical address. This may be accomplished at build time by establishing a page directory in which each entry links to the 4 MB frame that corresponds to it. This may be accomplished in NASM using macros and commands ( percent rep, times, and dd). Ordinary assembly code instructions can, of course, be used to accomplish this at runtime.
Enabling Paging
Paging is triggered by altering bit 31 (the PG “paging-enable” bit) of cr0 to 1 after putting the address of a page directory to cr3. Set the PSE bit (Page Size Extensions, bit 4) of cr4 to 4 MB pages to utilize 4 MB pages. As an example, consider the assembly code below:
Don’t forget
It’s crucial to remember that all addresses in the page directory, page tables, and cr3 must be real, not virtual, addresses to the structures. This will be more important in later parts as the paging structures will be dynamically updated. The invlpg instruction is useful for updating a PDT or PT. It invalidates a virtual address’s Translation Lookaside Buffer (TLB) entry. The Translation Lookaside Buffer (TLB) is a cache for translated addresses that maps physical addresses to virtual addresses. Only when updating a PDE or PTE that was previously mapped to anything else is this necessary.Executing invlpg is unnecessary if the PDE or PTE was already designated as not existent (bit 0 was set to 0). All entries in the TLB will be invalidated if the value of cr3 is changed.
The following is an example of invalidating a TLB entry:
; invalidate any TLB references to virtual address 0
invlpg [0]
Paging and the Kernel
The impact of paging on the OS kernel will be discussed in this section. Before attempting to create a more complex paging arrangement, I recommend running your OS with identity paging, as it might be difficult to troubleshoot a faulty page table built up through assembly code.
Reasons to Avoid Kernel Identity Mapping
There will be difficulties linking the user-mode process code if the kernel is put at the beginning of the virtual address space — that is, the virtual address space (0x00000000, “size of kernel”) translates to the position of the kernel in memory. Normally, the linker believes that the code will be loaded into memory location 0x00000000 during linking. As a result, when resolving absolute references, the base address for determining the precise position will be 0x00000000. However, if the kernel is mapped to the virtual address space (0x00000000, “size of kernel”), the user-mode process cannot be loaded at virtual address 0x00000000 and must be loaded elsewhere. As a result, the linker’s assumption that the user-mode process is loaded into memory at location 0x00000000 is incorrect. This can be fixed by using a linker script that directs the linker to start at a different address, however, this is a highly inconvenient solution for operating system users.
This also implies that we want the kernel to be in the address space of the user-mode process.
This is a useful feature since we don’t have to alter any paging structures to have access to the kernel’s code and data during system calls. To prevent a user process from reading or writing kernel memory, the kernel pages will need privilege level 0 access.
The Kernel’s Virtual Address.
The kernel should ideally be put at a very high virtual memory address, such as 0xC0000000 (3 GB). The user-mode process is unlikely to be 3 GB in size, which is the only way it may now cause a kernel conflict. A higher-half kernel is one that uses virtual addresses in the range of 3 GB and up. The location 0xC0000000 is only used as an example; the kernel may be put at any address greater than 0 to get the same results. The proper address is determined by the amount of virtual memory available for the kernel (it is simplest if all memory above the kernel virtual address belongs to the kernel) and the amount of virtual memory available for the process.
The kernel will need to swap out some pages if the user-mode process is greater than 3 GB. This article does not cover page swapping.
Placing the Kernel at 0xC0000000
To begin, it is preferable to set the kernel at 0xC0100000 rather than 0xC0000000, because this allows you to map (0x00000000, 0x00100000) to it (0xC0000000, 0xC0100000). The whole memory range (0x00000000, “size of kernel”) is therefore mapped to (0xC0000000, 0xC0000000 + “size of kernel”).
It’s not difficult to set the kernel to 0xC0100000, but it does need some consideration. This is another connecting issue. Because relocation is utilized in the linker script (see the section “Linking the kernel”), when the linker resolves all absolute references in the kernel, it will presume that our kernel is loaded at physical memory position 0x00100000, not 0x00000000. However, we want to resolve the jumps using 0xC0100000 as the base address, since otherwise, a kernel jump will leap right into the user-mode process code (remember, the user-mode process is loaded at virtual memory 0x00000000).
However, because we want the kernel to be loaded at the physical address 0x00100000, we can’t just instruct the linker to believe the kernel starts (is loaded) at 0xC01000000. The kernel is loaded at 1 MB because there is BIOS and GRUB code loaded below 1 MB, which prevents it from being loaded at 0x00000000. Furthermore, because the system may not have 3 GB of physical memory, we cannot assume that we can load the kernel at 0xC0100000.
This may be fixed in the linker script by utilizing both relocation (.=0xC0100000) and the AT instruction. Non-relative memory references should utilize the relocation address as the base in address computations, according to Relocation. AT indicates the location in RAM where the kernel should be loaded. The load address given by AT is handled by GRUB when loading the kernel, and is part of the ELF format. Relocation is done at link time by GNU ld.
Higher-half Linker Script
To accomplish this in this scenario, we must change link.ld. You may use the following code:
Entering the Higher Half
There is no paging table when GRUB goes to kernel code. As a result, all accesses to 0xC0100000 + X will not be mapped to the proper physical location, resulting in a general protection exception (GPE) at best, and a crash at worst (if the machine has more than 3 GB of memory). As a result, assembly code without relative jumps or relative memory addressing must be utilized to do the following tasks:
• Create a table of contents.
• For the first 4 MB of the virtual address space, add identity mapping.
• Create a mapping for 0xC0100000 to 0x0010000.
When the CPU tries to acquire the next instruction from memory once paging is enabled and we omit the identity mapping for the first 4 MB, the CPU will issue a page fault. Following the creation of the table, a leap to a label may be made to have eip point to a virtual address in the upper half:
; assembly code executing at around 0x00100000
; enable paging for both actual location of kernel
; and its higher-half virtual location lea ebx, [higher_half] ; load the address of the label in ebx
jmp ebx ; jump to the label higher_half:
; code here executes in the higher half kernel
; eip is larger than 0xC0000000
; can continue kernel initialisation, calling C code, etc.
All code may now run as if it were situated at 0xC0100000, the higher-half because the register eip now points to a memory address someplace directly after 0xC0100000. The page table entry mapping the first 4 MB of virtual memory to the first 4 MB of physical memory can now be deleted, and the TLB entry associated with it invalidated using invlpg.
Running in the Higher Half
There are a couple extra details we must deal with when utilizing a higher-half kernel. We must be careful when utilizing memory-mapped I/O that uses specified memory addresses. The frame buffer, for example, is situated at 0x000B8000, but because the address 0x000B8000 no longer exists in the page table, the value 0xC00B8000 must be used instead, because the virtual address 0xC0000000 translates to the physical address 0x00000000. Similarly, any explicit references to addresses inside the multiboot structure must be updated to reflect the new virtual addresses.
It’s straightforward to map 4 MB pages for the kernel, however, it wastes RAM (unless you have a really big kernel). It is easier to put up a higher-half kernel mapped in as 4 KB pages, but it saves memory. The data section can be used to reserve memory for the page directory and one-page table, but the mappings from virtual to physical addresses must be configured at run-time. Exporting labels from the linker script, which we’ll need to do later when creating the page frame allocator, may be used to calculate the kernel’s size.
Virtual Memory Through Paging
Paging makes it possible to do two things that are beneficial to virtual memory. For starters, it enables fine-grained memory access control. Pages can be marked as read-only, read-write, exclusively for PL0, and so on. Second, it gives the impression of a continuous recollection. The memory may be accessed as if it were contiguous by user-mode programs and the kernel, and the contiguous memory can be expanded without the need to transfer data around in memory. We may also provide user-mode applications access to all memory below 3 GB, but we don’t have to allocate page frames to the pages unless they utilize it. This enables processes to contain code around 0x00000000 and a stack slightly below 0xC0000000 while yet requiring just two real pages.
References
- The little book about OS development (Erik Helin, Adam Renberg)
- My Git Hub Repository: The XamOS
- A useful Wikipedia article on paging
- The OSDev Wiki has a page on paging
- A tutorial for making a higher-half kernel
- Gustavo Duarte’s article on how a kernel manages memory
- Details on the linker command language @ Steve Chamberlain’s website.
- More details on the ELF format can be found in this presentation
I hope to get back to you with chapter eight as soon as possible. Till then,
Stay Safe!!! 👋
-Nipuni Perera-