I will try to keep this to the point. Auto-completion on the terminal is something we all love and it makes using a UNIX system and running commands far more pleasant. Most shells can auto-complete path names, binary names, and built in commands. Bash goes further and supports auto-completing user names, hosts and a few other trivial things. No shell that I know of has ever attempted to auto-complete the arguments that the binaries take. Leaving out support for this makes sense, as there is no common way for a binary to store the arguments it can take inside the program binary, and it is bound to be a porting nightmare.
Keeping this in mind, I realized that almost every single UNIX binary gets its arguments from the shell in a standard, POSIX-compliant way. The getopt libc function call parses the input from the shell in to usable internal flags. If one were to peek inside what each binary gives to getopt(), one would find out all arguments it is expecting to take and provide more insight about the executable! This is what I have done and what the remainder of the post is about.
This is what my previous post related to. Now I realize this is a slightly silly goal. My primary reason for doing this is to learn the techniques I’ve used to get there, which I simply could not learn without experimentation and a concrete goal in mind. The way this problem was attacked as follows:
Each step has an explanation of it below it:
How this works
- I use libelf(3) to open the binary, and read its sections.
- Most UNIX binaries are in the ELF format. The official draft is included in the tarball at the end of this post. ELF (or Executable and Linkable Format) is a file format that most UNIX systems today understand.
- The sections I care about are the PLT and the dynamic symbols section.
- An ELF file contains information in sections. The two sections I mentioned are the section that contains the dynamic symbols and the procedure linkage table
- Dynamic symbols - When you write a program that uses a shared library, libc being the prime example. One copy of libc is shared among many processes, and when you compile a program, the actual code from libc does not get compiled into the binary. What happens is that your compiler leaves a little note to your operating system (or the operating system loader [Not the boot loader! :)]) saying “Here I call some functions that should be in a shared library that you might have loaded, and if not you can load it as you need it. I will be calling printf and getopt, so I will reference to them as if I have them. Please fill in those references as you find your own copy of libc”. That list of functions is called the dynamic symbol table. Each process that utilizes shared libraries (which is almost all of them) has a GOT (or a Global Offset Table) which is a table that maps those symbols to the locations in the library where the code actually is. So when you call printf() in your code, in the compiled instructions, the code actually looks at the printf() entry in the GOT and jumps to whichever address it points to. When you leave those references ‘open’ as I mentioned earlier, those entries are simply not filled in. When the loader resolves those references, it fills in the proper address of the shared library. So to picture it, the flow of execution is as follows: printf() –> GOT –> actual printf code. Now for reasons outside the scope of this post, there is yet another level of indirection. So in reality the flow is: printf() –> PLT –> GOT –> actual printf(). The PLT is a series of jump statements that go to the GOT. This jump table is what we focus on.
- I then extract the position of the getopt symbol and look it up in the PLT.
- From the information I retrieved using libelf, I check at which address the PLT table gets loaded (section ‘.plt’), then I check the index of the getopt symbol in the symbol table, and I obtain the address of the PLT entry by simply performing: .plt + (getopt position + 1) * 0×10 (0×10 is the size of a plt entry as far as I know, and +1 because I want to skip the 0th entry of the PLT table)
- I start the binary, overwrite the proper address, set a breakpoint, and extract the arguments.
- I now have the jump that gets taken every time when getopt gets called. I now fork() and before I execv() the process, I enable the process to be traced with the ptrace(3) interface. This is the same method that debuggers use to attach to processes. The parent gets notified once the child is finished being loaded if it is being traced. Once I get notified that it is created, with the proper address of getopt in hand, I overwrite that jmp instruction with a int3 instruction (or 0xCC assembled. Something to note, in my code I overwrite it with 0xcccccccc, which is just four int3 instructions. I didn’t want to bother with byte-ordering or alignment issues, so I just overwrote the entire word. Since the instruction is only one byte, it works just fine.) This instruction will trap into the parent once reached. This is also how debuggers set breakpoints, with a slight difference: They save the original instruction they overwrote so they can restore it on the next execution, but since I simply don’t care for it to continue running I can just go walking all over it. Now I continue the child process.
- All my child interaction and prodding was done with the ptrace() system call.
- Once the parent gets trapped again, I now know that I am at the point where getopt() was JUST called. If you remember, the standard C calling convention is to push all the arguments to the stack and then call the function. I now know that %esp points to the first argument passed, so I know that at a certain offset will be the last argument, which is the string of every argument that a binary is expecting, which is what I care about.
- I now know where the string of arguments is in the child’s address space, at which point I can safely extract it, then kill the child process before it does anything. How mean.
- I can now rinse+repeat for other binaries in which I’m interested.
Why this works
- More programs call getopt() as one of the absolute first things they do. This means that there is a very low chance that any side-effects will come up.
- This is fast since the OS only loads the pages of code that are being executed, and there is a very high chance that getopt will live in the first page of code, making it pretty fast.
Something to note here: Not all binaries use getopt. This is a problem, but not one that I care about to fix. This took a little over three weeks to complete due to the lack of material about the matter on the Internet, and the slightly esoteric nature of the solution. Check out the ltrace utility if you want something like what I wrote on steroids (outlines every single library call with all arguments).
I originally attempted to read ltrace source to figure out how to solve the problem, but it confused me more than it helped me. In the end a sit down with the ELF spec and some time is what it took.
If you have any questions, comments, additions or critiques, please either comment on the post or send me an email.
The code I wrote and used can be found here: http://isis.poly.edu/~yan/readgo.tbz (You’d need libelf installed to get it to compile/run). This only works on FreeBSD/i386. Linux has a slightly different ptrace() interface, so porting will be trivial, but existent.
Since everyone loves some sample output:
yan@tissue$ ./readgo /bin/ls /bin/rm
/bin/ls: 1ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx
/bin/rm: dfiIPRrvW
edit: Fixed a lot of grammatical mistakes thanks to Kurt.
Thanks for reading,
Yan













0 Responses to “Prodding programs”
Leave a Reply