GSoC2008 Stefan Journal

From Syslinux Wiki
Jump to: navigation, search

Journal: Stefan Bucur, Dynamic Module Loading Using the ELF Format

Week 1

19 May 2008

Journal created. Spoke to hpa and erwan_taf regarding project timeline.

Week 2

Week 3

10 June 2008

  • I finally understood entirely the way modules are loaded in the memory. This is one of the steps involved in creating a memory image of an ELF object:
    1. Map the module file in the memory, and read/verify its header. (Done)
    2. Load the relevant segments in the memory, at a relocatable location. (Done)
    3. Traverse the dynamic information segment and store pointers to relevant sections: symbol table, hash table, string table, GOT and PLT, and relocation tables. (Almost done)
    4. Traverse relocation tables and search for undefined symbols and fix absolute addresses. (Yet to be done)
  • I still don't understand entirely the difference between a REL and RELA table (or more precisely the need to have two different tables instead of a single one that covers all the cases). As far as I see from the sample shared object I built, the linker generates REL tables. Should I support also RELA?
  • The explanations from the ELF specs regarding base addresses are quite cumbersome, but thanks to hpa's clarifications, I derived a clearer algorithm in this matter. So below are the steps I took in order to load ELF sections into memory:
    1. Traverse each loadable segment and get its lowest virtual address, highest virtual address (lowest vaddr + segment memory size), and alignment requirements.
    2. Get the minimum from the lowest vaddrs, and the maximum from the highest vaddrs. Also get the maximum value of alignment requirements (enforced at least sizeof(void*)).
    3. Truncate the minimum to the lower nearest multiple of the max. alignment, and the maximum to the higher nearest multiple of the max. alignment. In this way the original interval will be covered with the smallest number of blocks of max. alignment size (memory pages, in our case).
    4. Allocate the required aligned memory to cover all the blocks, and compute the difference between the start address of the memory and the truncated minimum virtual address. This difference is the base address.
    5. Make sure to initialize with zero the allocated memory, so that .bss data would be zero.
    6. Traverse again the segments in the ELF file and copy them at the base address + their starting virtual address in the memory.
  • The algorithm is implemented in commit 4e14b7d9.

11 June 2008

  • Finally found that the x86 architecture does not use Elf32_Rela structures, so I'll only need to deal with Elf32_Rel structures, which, according to ELF arcitecture specific documentation, has an implicit addend in the location pointed by the structure.
  • I fixed some errors in the Linux linked list adaptation, that would only show when those macros would be actually used somewhere in the code - commit 0faff07b.
  • I found some inadvertences in the ELF specifications on the Internet regarding the way relocations are calculated. I am quite surprised about the poor documentation regarding ELF format specifications and especially the GNU extensions. The ELF binaries are the foundation of a Linux system, and I think they should have been very well documented. For instance, I cannot find an official description of the GNU hash algorithm, meant to replace the classic System V implementation (currently supported by my code).
  • I implemented the searching of symbols in a given module through the hash table section. I also implemented the searching of symbols in all the modules currently loaded in the system, in a way that gives priority to the normal, GLOBAL symbols (aka 'strong' symbols), and uses WEAK symbols only when they are the only symbols with the given name. If there are more than one weak symbol, I choose the one contained in the last module loaded in the system.
  • Those functions are implemented in commit b56dd67e.

12 June 2008

  • Discovered that the uniqueness of global symbols is required only when performing compile-time linking. Dynamic linking allows duplicate exported symbols. I fixed my code so that my implementation now takes the first global symbol it encounters, in the reverse chronological order of module insertions (that is, the latest module prevails).
  • Implemented all the relocations used by the dynamic linker. The R_386_COPY one was a tricky thing to do - thanks to uClibc, I found the proper way to implement it.
  • Milestone: Created two modules that shared global variables and external functions and modified the test program to load them, invoke the functions and observe how the shared variables get altered. (Regarding milestones, I will fill up the project planning page as soon as I have the time). Commit available here - fe3b5e93.
  • I needed a way to find out how to interpret the GNU specific hash table, so that symbol lookup would be more efficient. I discovered that the glibc source code is quite disorganized, at least in the portions implementing the ELF specs. To the naked eye, everything looked like many hacks that somehow, magically, manage to work together (please pardon my jokes here :), I really appreciate the work of the guys at GNU, as their glibc, whatever its source code may look like, works great and it's amazing how far they went with a project started a couple of decades ago). Anyway, the GNU hash implementation was hid somewhere in this source file (or at least part of it). No LXR database already built, the Eclipse IDE C Indexer failed to properly index the whole source code... Mission impossible!
  • The solution came from the uClibc project, which I found to have implemented the GNU hash lookup algorithm here. It wasn't a pure implementation, but at least it was much more clear about the steps required to correctly and efficiently use the hash structure. The next days I will implement the GNU hash lookup in its pure form. Maybe it will serve to other projects being in the same situation as I've been.
  • The next days I won't be fully available for the project, as I need to learn for a Database Design exam. Wish me luck! :)

15 June 2008

  • I have implemented and tested the GNU hash search algorithm, adapted from the uClibc implementation. Now the symbol search routines firstly check to see whether there is a GNU hash table available for the module, search the symbol with that table, and if the symbol is not found, falls back on the SysV standard hash (as there are some symbols the GNU hash does not contain - see this discussion). This is commit fb95a0dc.

16 June 2008

  • I have started replacing the mmap()-based module loading implementation, with one based on FILE streams, without backward seeking support. This is a required step as the code will be integrated into the SYSLINUX environment, where only simple operations would be possible, and also the backend of the stream may consist of a network connection, where seeking is not possible.
  • The approach was to define wrapper functions for the following operations:
    • Stream opening - the image file is open and an internal offset variable is set to zero.
    • Stream closing - the image file is closed.
    • Reading - a given amount of bytes would be read from the stream into a specified buffer, and the current offset would be increased accordingly.
    • Skipping - a given amount of bytes would be read into a temporary buffer, that is discarded before returning. The current offset is advanced, however.
    • Seeking at an absolute offset - this operation permits only forward seeking, and is implemented using the Skip operation.
  • So loading an ELF module would imply the following I/O operations:
    1. Open the associated FILE stream.
    2. Read the ELF header in a buffer (my implementation allocates the buffer on the stack and delivers a pointer to it to any sub-step requiring it).
    3. Read the entire Program Header Table in a buffer (allocated on the heap, with malloc()). The alternative of reading each PHT entry in a local variable doesn't work, as the PHT needs to be traversed twice (first to determine the entire memory size occupied by the module, then to copy each segment to its place in the allocated memory), and we cannot go back in the file to read it again.
    4. Read each loadable segment into the appropriate memory location.

Week 4

17 June 2008

  • I have discovered that due to alignment constraints, the starting of the first segment in the file is marked before the current offset in the file, so the memory image would contain some padding at the beginning that otherwise would contain the ELF header and PHT (useless to the runtime, however). To solve this issue, I start the reading of the first segment at the current offset and shrink the amount of data read with the same value. This way, the initial padding in the memory mapping would contain just zeros.
  • The stream-based module loading is present in commit 08d0c8a5.
  • I have also implemented dependency information as a module graph. The nodes in the graph are the module structures, and the links are special dependency structures that form linked lists and contain pointers to other modules. Thus, each module has a dependency list with the modules that depends on (the required modules), and another list with the modules that depend of it. Thus, each link in the graph exists through two dependency structures: one structure in the required list of a module, and another one in the dependants list of the other module.
  • The graph is populated when a new module is added and relocation is performed. For each relocated entry that involves a foreign symbol, a dependency link is added between the relocated module and the origin of the symbol.
  • In order a module to be unloadable, its dependables list must be empty, that is, no other module should reference data from this particular one.
  • The dependency implementation is present in commit b909bc58.

18 June 2008

  • Before starting integrating my work into SYSLINUX, I took some time to read more advanced documentation about git and rebase my elflink development branch with the latest version of master. Git in a Nutshell proved to be some great material.
  • As the rebase operation modifies the commit history, in order to keep valid the references to commits in this journal, I have created a separate branch, elflink-old whose only role is to keep those commits from not being wiped by the git garbage collector.

Week 5

(Lack of activity due to exams session)

Week 6

1 July 2008

  • I have rebased my commits on the latest changes in master branch (release of SYSLINUX 3.70), so I can make sure my code doesn't break because of any changes made by hpa in the mean time.
  • I have moved the ELF linking code from its separate directory to a new one, under com32/, and I created a simple com32 module that doesn't do anything useful, but it is used to compile my code in the klibc environment. While compiling my code, I noticed a series of errors:
    • The klibc library lacks an implementation of posix_memalign(), and I had to simulate one with malloc(). This implementation obviously wastes memory proportional to the alignment size (as it discards the memory allocated ahead of the aligned offset), but it is an acceptable solution until an efficient implementation is made in klibc.
    • The elfcommon.h header was lacking a lot of definitions for the GNU i386 ELF modules. I have copied the missing definitions from my /usr/include/elf.h header on my computer.
  • These changes can be found in commit bd659afc.

Next Weeks

(TODO: Explain what happened in the mean time)

22 July 2008

  • I modified the malloc.c source file in order to add native support for the posix_memalign() function, needed by the ELF loader.
  • The approach was as follows:
    • Search through every free block in the memory arena, until one of suitable size is found, which might hold the request size (it is not certain if it actually can, because of alignment constraints).
    • Find an aligned position in the free block, and see if the allocation would fit. Make sure to leave enough space behind, so the arena header for the block and arena header for the free space before would fit.
    • Split the block in two parts, if required: one free block before the alignment, and another free block after the alignment.
    • Call a simple malloc() on the second free block, and return the allocated address.

23 July 2008

  • I started to document on the ld tool, in order to see how to create ELF modules loadable from SYSLINUX. I read especially the scripting documentation.
  • I made a shared object with the klibc object files, and analyzed its symbol information and ELF sections using readelf.

24 July 2008

  • I managed to load the big klibc shared object with a COM32 module implementing the ELF loading & linking functionality I done in user space. Everything worked like a charm, but some symbols remained undefined, which led a chain reaction in which I would take out object files from the dynamic library, and introduce additional missing symbols. I managed to stabilize somehow the library, letting completely out some components (like JPG, PNG and ZIP libraries), and dividing (theoretically) the symbols and code into two parts:
    • A part that is loaded directly in the COM32 module and provides core functionality, such as module base address and constructors code.
    • A part that is loaded as an ELF library, that is dynamically linked against the COM32 module.
    • Other parts of the C library, as additional modules.
  • The difficult task was to obtain access to the COM32 symbol table. My approach was to use objcopy to dump the symbols into an empty ELF object, that would be loaded by the already existing module infrastructure. A minor issue was to make the object as compact as possible, as the default ld behavior was to align the sections to a page boundary (which would introduce unused space in the file).

See also