interpret-indirect-addressing-mode-example-in-assembly

Today one of my friend asked me to interpret below assembly instruction for him, so this post ūüôā

 

The instruction  have below format,

pg=1 ( segmentation and paging in linux) -Part 1

[These notes are taken from my study¬† on this topic @post date, things may or may not have changed.. Refer latest kernel source code and manuals if you really care ( Or if you think I have a misunderstanding on this topic ūüôā )about it…]

 

Here, I will explain some bits about segmentation and paging.. It is not possible to explain these things in much detail as there are enough to discuss on this topic. More or less it is the responsibilty of a book author ūüôā

The ‘web’ gives details on these topics, but it may not match with what you are experiencing or expecting. The main reason for this is, the options left to the ‘architecture’ code and the ‘kernel’ code.. There will be options for a particular OS to choose and implement- which may not be the same way the processor vendors explain..¬† That said, x86 processeors do “segmentation” and “paging” for converting a “logical address” to “physical address”..

Years back, when I started to poke on this area I was confused with these terms- so let me make it clear before I move on.

*) Logical Address
*) Linear Address ( Virtual addresses)
*) Physical Address

Hardware got  2 eninges ( segmentation unit && Paging Unit) to perform these translations.

‘Logical address’ is the input for segmentation unit and the output will be linear address which act as an input for “Paging Unit” thus finally you have the physical Address.

There are mainly 2 CPU modes which have to be considered for segmentation :

1) 16 bit real mode

Here the segment register got 16 bit space which will point to the physical memory at the start of the segment.. It is limited to 64k chunks of segments becausue of 16 bit space. To address atleast IMB of memory in real mode, vendors increased the number of pins by 4. The calculation goes like this:

segment selector * 16 + offset ( logical address ) == Physical Address…

Ex: Segment selector : 0Ah , offset = OFh will derive to AFh

2) 32 bit protected mode:

logical address consists of a segment selector and an offset :

segment selector also known as ‘segment identifier’

segment selector is a 16 bit field and offset is a 32 bit field

+——————–+
|index|TI|RPL   |
+——————–+
15-3 :  2 : 1-0

 

Where:

index

Identifies the Segment Descriptor entry contained in the GDT or in the LDT .

TI

Table Indicator : specifies whether the Segment Descriptor is included in the GDT (TI = 0) or in the LDT (TI = 1).

RPL , mainly used for CSD..

Requestor Privilege Level : specifies the Current Privilege Level of the CPU when the corresponding Segment Selector is loaded into the cs register; it also may be used to selectively weaken the processor privilege level when accessing data segments (see Intel documentation for details).

Now, if you read intel/amd manuals you can see that, there are segment registers ( cs,ds, es, ss, fs, es) which will be filled with  segment selectors time to time..Segment selector will points to a segment ( which consists of segment descriptors). segment descriptors got a strucutre as shown below..

cs : The code segment register , this segment contains program instructions ( This have ‘2’ bit to represent the current priv level 0/ring0 and 3/ring3)

ds : The data segment register , this points to global and static data
ss : The stack segment register, which points to program’s stack

[segment descriptor diagram/format]

segment decriptors initializers use below macro:

+——-+
|gdtr¬†¬† | —-> [GDT]—> [64 bit segment descriptor]
+——-+

+——-+
|ldtr¬†¬† | —-> [LDT]—> [64 bit segment descriptor]
+——-+

In 32 bit protected mode, the segment descriptor address can be retrieved by (‘index’ * 8 + gdt/ldt address).

Maximum number in GDT is 2^13 -1 , 8191

First entry in GDT is 0 which can cause processor exception in case of  null segment selector..

The control registers GDTR and LDTR points to the segment selector:

In linux the main segment descriptors are:

CS, DS and TSS, LDT

TSS got its ‘S’ flag unset, which means it is system segment . This descriptor will always be in GDT. The processor registers are stored in it. Type is 9 or 11.

LDT also got its ‘S’ flag set which will be in GDT and points to the LDT segment.. Type field is ‘2’

Now, having above stuffs in mind, deriving a linear address from a logical address is NOT a tough thing.. That said, ‘BASE’ of the segment can be derived as mentioned above and ‘offset’ can be added with this which will land to final ‘linear address’.

The index field of the logical address will be used with the segment address ( stored in segment register) to locate segment descriptor.. The untouched part of logical address ( offset) will be used to calculate the output/linear address.. But linux tries to avoid segmentation.. Changing segment registers time to time is bit difficult..

We mainly have to think about below segments

User code segment
User Data segment
Kernel code segment
Kernel Data segment

In Macro representation:

This is how linux does.

When a linux process is executing in user mode, segments are pointing to user {code, data} segments and in Kernel mode the registers will be pointed to kernel code and data segments..

Intel got¬† dummy registers equivalant to segment registers to make the segmentation easier. It will be used same segment selector and keep producing the linear address.. The normal process will trigger ( Referring to GDT/LDT) again when there is some change in the mode of the operation.. That said, the ‘dummy’ register will be loaded with segment descriptor when-ever original segment register is loaded with the segment selector, so that “GDT” or “LDT” reference can be avoided thus can achieve speed..¬† I am stopping about “segmentation” here, otherwise I will be keep writing and you wont reach any where. ūüôā

 

How-ever please see some code snips below from linux kernel source and bit more about GDT,LDT

IN LINUX:

X86 GDT layout :

GDT is per_cpu data on linux.. cpu_gdt_table

 

Each processor got its own TSS , so GDT entries can differ .. Also, TLS and LDT entries can differ wrt the process running in the CPU. PnP and APM entries are invoked by BIOS code, so can run custom code and data.

TSS are also used for ‘double fault’ exceptions..

LDT:

Mainly 5 entires . 2 of them are mainly used for Call Gates..

So, Once you got the linear address , it is ‘paging units’ responsibility to translate that to physical address.

 

[Please continue reading here]