A peek at ELF

Having learnt an object file has .text section for code, .data section for data, .bss section for uninitialized data, and so on and so forth, it’s still unclear how it’s mapped into a real ELF file. How an elf file can represent these informations? Through objdump, we may peek into the content of these sections.

objdump -s obj/boot/boot.out 

The result looks like

obj/boot/boot.out:     file format elf32-i386

Contents of section .text:
 7c00 fafc31c0 8ed88ec0 8ed0e464 a80275fa  ..1........d..u.
 7c10 b0d1e664 e464a802 75fab0df e6600f01  ...d.d..u....`..
...

Contents of section .eh_frame:
 7d80 14000000 00000000 017a5200 017c0801  .........zR..|..
 7d90 1b0c0404 88010000 1c000000 1c000000  ................
 
...

It’s misleading for a beginner to look at this directly, because from output it seems that the main contents are interleaved with those headers; however, they are not. Now I’m doing a little mapping how the binary can be interpreted into that way. In fact, the headers locates at the beginning and the end of file; all the data and code in different sections or segments are bunched together in the middle.

Tools

  • readelf (greadelf for mac, installed by brew install binutils)
  • objdump (gobjdump for mac)
  • wxHexEditor

File layout

ELF header
program header
...
program header
code & data
section header
...
section header

Example

Take the object file boot.out for example. You may use wxHexEditor to peek its raw hex like this.

readelf does a little interpretation on these hex’s.

readelf --all boot.out

The result looks like

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
...
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
...

ELF header

The first part of the file is ELF header. The most notable characteristic is the magic number 7f 45 4c 46, where 45 4c 46 interprets as ELF in ASCII.

Actually this is the e_ident field defined by struct ElfN_Ehdr. For more detail, please refer to Wikipedia.

Program header and section header

The data and code part of a program can be divided into pieces. In a linker’s view, it’s divided into sections, and the the meta-data of these sections are stored in section header table (from the figure above we can see it’s located at the end of the file). In kernel’s (executor) view, data and code can be divide into segments, and the meta-data of them are stored in program header table.

Section header string table

Well, its name is a bit long. Actually it is a section of the file, and it is one of the sources how objdump know the names of each section. First let’s find it in the raw hex file.

readelf --all boot.out

and look at this

ELF Header:
  ...
  Start of section headers:          4760 (bytes into file)
  ...
  Size of section headers:           40 (bytes)
  ...
  Section header string table index: 6
  ...

It says the start of section headers is 4760, and using this info, readelf can further interpret the section headers for us (continue on the former result)

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  ...
  [ 6] .shstrtab         STRTAB          00000000 001254 000043 00      0   0  1
  ...

Actually we may find the .shstrtab entry ourselves. As from the ELF header we know that the start of section header is 4760, and that size of them is 40, and that the index of .shstrtab is 6, we can calculate the start of .shstrtab is $4760 + 6 \times 40 = 5000$.

It’s around there. $[5000, 5040)$ of boot.out maps to struct Elf32_Shdr (ref).

typedef struct {
  uint32_t   sh_name;
  uint32_t   sh_type;
  uint64_t   sh_flags;
  Elf64_Addr sh_addr;
  Elf64_Off  sh_offset;
  uint64_t   sh_size;
  uint32_t   sh_link;
  uint32_t   sh_info;
  uint64_t   sh_addralign;
  uint64_t   sh_entsize;
} Elf64_Shdr;

The bytes at 5000 is 0x11, and it should be interpreted into its name, i.e. .shstrtab. How can? Official explanation is

sh_name This member specifies the name of the section. Its value is an index into the section header string table section, giving the location of a null-terminated string.

So there’s a section .shstrtab that records all the names of sections. Where is it? Clue is given by this table entry. Let’s look at the interpreted version by elfread.

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  ...
  [ 6] .shstrtab         STRTAB          00000000 001254 000043 00      0   0  1
  ...

Offset is 0x1254, size is 0x43, so we will look into [0x1254, 0x1297).

From the right, we can see that we are finding the right place. Then what does 0x11 mean? This index is a bit confusing. It’s just the index in that char sequence. $0x1254 + 0x11 = 0x1265$, so let’s look at the string starts at 0x1265.

That’s correct, .shstrtab, and that is what we are looking for.

Conclusion

This post looks into some aspects of ELF file format. An example is given to illustrate how objdump -s may know the locations and names of the sections from it.

References

  • https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
  • http://man7.org/linux/man-pages/man5/elf.5.html
  • http://www.opensecuritytraining.info/LifeOfBinaries.html
Written on December 19, 2015