pg=1 ( segmentation and paging in linux) -Part 1

[These notes are taken from my study  on this topic @post date, things may or may not have changed.. Refer latest kernel source code and manuals if you really care ( Or if you think I have a misunderstanding on this topic 🙂 )about it…]

Here, I will explain some bits about segmentation and paging. It is not possible to explain these things in much detail as there are enough to discuss on this topic. More or less it is the responsibility of a book author 🙂

The ‘web’ gives details on these topics, but it may not match with what you are experiencing or expecting. The main reason for this is, the options left to the ‘architecture’ code and the ‘kernel’ code. There will be options for a particular OS to choose and implement- which may not be the same way the processor vendors explain.  That said, x86 processors do “segmentation” and “paging” for converting a “logical address” to “physical address”.

Years back, when I started to poke on this area I was confused with these terms- so let me make it clear before I move on.

*) Logical Address
*) Linear Address ( Virtual addresses)
*) Physical Address

Hardware got  2 engines ( segmentation unit && Paging Unit) to perform these translations.

‘Logical address’ is the input for the segmentation unit and the output will be linear address which acts as an input for the “Paging Unit” thus finally you have the physical Address.

There are mainly 2 CPU modes which have to be considered for segmentation :

1) 16 bit real mode

Here the segment register got 16 bit space which will point to the physical memory at the start of the segment. It is limited to 64k chunks of segments because of 16 bit space. To address at least IMB of memory in real mode, vendors increased the number of pins by 4. The calculation goes like this:

segment selector * 16 + offset ( logical address ) == Physical Address…

Ex: Segment selector: 0Ah, offset = OFh will derive to AFh

[terminal]
/* Simple and small GDT entries for booting only */

#define GDT_ENTRY_BOOT_CS       2
#define __BOOT_CS               (GDT_ENTRY_BOOT_CS * 8)

#define GDT_ENTRY_BOOT_DS       (GDT_ENTRY_BOOT_CS + 1)
#define __BOOT_DS               (GDT_ENTRY_BOOT_DS * 8)

#define GDT_ENTRY_BOOT_TSS      (GDT_ENTRY_BOOT_CS + 2)
#define __BOOT_TSS              (GDT_ENTRY_BOOT_TSS * 8)
[/terminal]

2) 32 bit protected mode:

logical address consists of a segment selector and an offset :

segment selector also known as ‘segment identifier’

segment selector is a 16 bit field and offset is a 32 bit field

[terminal]
+——————–+
|index|TI |RPL   |
+——————–+
15-3 :  2 : 1-0

[/terminal]

From code:
[terminal]
/* Bottom two bits of selector give the ring privilege level */
#define SEGMENT_RPL_MASK        0x3
/* Bit 2 is table indicator (LDT/GDT) */
#define SEGMENT_TI_MASK         0x4

/* User mode is privilege level 3 */
#define USER_RPL                0x3
/* LDT segment has TI set, GDT has it cleared */
#define SEGMENT_LDT             0x4
#define SEGMENT_GDT             0x0
[/terminal]

Where:

index

Identifies the Segment Descriptor entry contained in the GDT or in the LDT.

TI

Table Indicator: specifies whether the Segment Descriptor is included in the GDT (TI = 0) or in the LDT (TI = 1).

RPL , mainly used for CSD..

Requestor Privilege Level: specifies the Current Privilege Level of the CPU when the corresponding Segment Selector is loaded into the cs register; it also may be used to selectively weaken the processor privilege level when accessing data segments (see Intel documentation for details).

Now, if you read intel/AMD manuals you can see that, there are segment registers ( cs,ds, es, ss, fs, es) which will be filled with segment selectors from time to time.Segment selector will point to a segment ( which consists of segment descriptors). segment descriptors got a strucutre as shown below..

cs : The code segment register , this segment contains program instructions ( This have ‘2’ bit to represent the current priv level 0/ring0 and 3/ring3)

ds : The data segment register , this points to global and static data
ss : The stack segment register, which points to program’s stack

[segment descriptor diagram/format]

segment decriptors initializers use below macro:

[terminal]
#define GDT_ENTRY(flags, base, limit)                   \
((((base)  & _AC(0xff000000,ULL)) << (56-24)) | \
(((flags) & _AC(0x0000f0ff,ULL)) << 40) |      \
(((limit) & _AC(0x000f0000,ULL)) << (48-16)) | \
(((base)  & _AC(0x00ffffff,ULL)) << 16) |      \
(((limit) & _AC(0x0000ffff,ULL))))
[/terminal]

[terminal]
+——-+
|gdtr   | —-> [GDT]—> [64 bit segment descriptor]
+——-+

+——-+
|ldtr   | —-> [LDT]—> [64 bit segment descriptor]
+——-+
[/terminal]

In 32 bit protected mode, the segment descriptor address can be retrieved by (‘index’ * 8 + gdt/ldt address).

Maximum number in GDT is 2^13 -1 , 8191

First entry in GDT is 0 which can cause processor exception in case of null segment selector.

The control registers GDTR and LDTR points to the segment selector:

In linux the main segment descriptors are:

CS, DS and TSS, LDT

TSS got its ‘S’ flag unset, which means it is the system segment. This descriptor will always be in GDT. The processor registers are stored in it. The type is 9 or 11.

LDT also got its ‘S’ flag set which will be in GDT and points to the LDT segment.. Type field is ‘2’

Now, having above stuffs in mind, deriving a linear address from a logical address is NOT a tough thing.. That said, ‘BASE’ of the segment can be derived as mentioned above and ‘offset’ can be added with this which will land to final ‘linear address’.

The index field of the logical address will be used with the segment address ( stored in segment register) to locate segment descriptor.. The untouched part of logical address ( offset) will be used to calculate the output/linear address.. But linux tries to avoid segmentation.. Changing segment registers time to time is bit difficult..

We mainly have to think about below segments

[terminal]
User code segment
User Data segment
Kernel code segment
Kernel Data segment
[/terminal]

In Macro representation:

[terminal]

* __KERNEL_CS (Kernel code segment, base=0, limit=4GB, DPL=0)
* __KERNEL_DS (Kernel data segment, base=0, limit=4GB, DPL=0)
* __USER_CS   (User code segment,   base=0, limit=4GB, DPL=3)
* __USER_DS   (User data segment,   base=0, limit=4GB, DPL=3)

arch/x86/include/asm/segment.h

#define __KERNEL_CS     (GDT_ENTRY_KERNEL_CS*8)
#define __KERNEL_DS     (GDT_ENTRY_KERNEL_DS*8)
#define __USER_DS       (GDT_ENTRY_DEFAULT_USER_DS*8+3)
#define __USER_CS       (GDT_ENTRY_DEFAULT_USER_CS*8+3)

[/terminal]

This is how linux does.

When a Linux process is executing in user mode, segments are pointing to user {code, data} segments and in Kernel mode the registers will be pointed to kernel code and data segments.

Intel got dummy registers equivalent to segment registers to make the segmentation easier. It will be used the same segment selector and keep producing the linear address. The normal process will trigger ( Referring to GDT/LDT) again when there is some change in the mode of the operation. That said, the ‘dummy’ register will be loaded with segment descriptor when-ever original segment register is loaded with the segment selector, so that “GDT” or “LDT” reference can be avoided thus can achieve speed..  I am stopping about “segmentation” here, otherwise, I will keep writing and you won’t reach anywhere. 🙂

 

How-ever please see some code snips below from Linux kernel source and bit more about GDT, LDT

IN LINUX:

[terminal]
#define GDT_ENTRY_INIT(flags, base, limit) { { { \
.a = ((limit) & 0xffff) | (((base) & 0xffff) << 16), \
.b = (((base) & 0xff0000) >> 16) | (((flags) & 0xf0ff) << 8) | \
((limit) & 0xf0000) | ((base) & 0xff000000), \
} } }

X86 GDT layout :

/*
* The layout of the per-CPU GDT under Linux:
*
*   0 – null
*   1 – reserved
*   2 – reserved
*   3 – reserved
*
*   4 – unused                 <==== new cacheline
*   5 – unused
*
*  ——- start of TLS (Thread-Local Storage) segments:
*
*   6 – TLS segment #1                 [ glibc’s TLS segment ]
*   7 – TLS segment #2                 [ Wine’s %fs Win32 segment ]
*   8 – TLS segment #3
*   9 – reserved
*  10 – reserved
*  11 – reserved
*
*  ——- start of kernel segments:
*
*  12 – kernel code segment            <==== new cacheline
*  13 – kernel data segment
*  14 – default user CS
*  15 – default user DS
*  16 – TSS
*  17 – LDT
*  18 – PNPBIOS support (16->32 gate)
*  19 – PNPBIOS support
*  20 – PNPBIOS support
*  21 – PNPBIOS support
*  22 – PNPBIOS support
*  23 – APM BIOS support
*  24 – APM BIOS support
*  25 – APM BIOS support
*
*  26 – ESPFIX small SS
*  27 – per-cpu                        [ offset to per-cpu data area ]
*  28 – stack_canary-20                [ for stack protector ]
*  29 – unused
*  30 – unused
*  31 – TSS for double fault handler

[/terminal]

GDT is per_cpu data on linux.. cpu_gdt_table

Each processor got its own TSS, so GDT entries can differ. Also, TLS and LDT entries can differ wrt the process running in the CPU. PnP and APM entries are invoked by BIOS code, so they can run custom code and data.

TSS is also used for ‘double fault’ exceptions.

LDT:

Mainly 5 entires . 2 of them are mainly used for Call Gates..

So, Once you got the linear address, it is ‘paging units’ responsibility to translate that to a physical address.

 

[Please continue reading here]

Digiprove sealCopyright secured by Digiprove © 2020 Humble Chirammal

2 thoughts on “pg=1 ( segmentation and paging in linux) -Part 1”

Comments are closed.