“Antiword” for Office 2007

July 20th, 2009 2 comments

Since I’ve already created a script to parse Office 2007 documents to extract metadata information from them (see here) there wasn’t much effort to re-write it a little bit to include a parser to parse the content of a Word document in a similar manner as Antiword does for older version of the Microsoft Word format.

The script can be downloaded from here.  The current version of the script only displays the content of a document created in Word, but there shouldn’t be that much work left to complete other formats as well (such as Excel and Powerpoint documents).

It works basically the same as the metadata extracter, but instead of displaying the metadata information, it displays the content of the file itself. It is the first version of this script, just slightly modified so some specific Word stuff does not print out as nicely as they should (such as Table of Content and Cover Pages) but that will be fixed in a future version if it.

Categories: Forensics Tags:

Firefox 3 History – revisited

July 5th, 2009 1 comment

Analysis of a browser history almost always comes up, no matter what is being investigated. And despite Firefox being one of the most popular browsers currently used there aren’t many tools out there that can read and display browser history (at least in a human readable format). There are tools out there, such as f3e from FirefoxForensics.com (firefoxforensics.com) however that tool, just as others that I’ve found, is only distrubuted as an EXE, running on Windows (and no source code is provided).

Traditionally Firefox stored the history file as a Mork file format, which could be easily read using any standard editor. The new version, that is version 3 which has been out for quite some time now, uses a different method of storing user history. The history file is stored in a MozStorage format, as a SQLite database (see description here) in a file called places.sqlite. The tables can be easily read using sqlite3 for instance:

sqlite3 places.sqlite ".tables"
moz_anno_attributes  moz_favicons         moz_keywords
moz_annos            moz_historyvisits    moz_places
moz_bookmarks        moz_inputhistory
moz_bookmarks_roots  moz_items_annos

There are two tables in particular that are of interest , the moz_places and moz_historyvisits table. The schema for these tables can as well be easily extracted using sqlite3:

sqlite3 places.sqlite ".schema moz_places"
CREATE TABLE moz_places (
id INTEGER PRIMARY KEY,
url LONGVARCHAR,
title LONGVARCHAR,
rev_host LONGVARCHAR,
visit_count INTEGER DEFAULT 0,
hidden INTEGER DEFAULT 0 NOT NULL,
typed INTEGER DEFAULT 0 NOT NULL,
favicon_id INTEGER,
frecency INTEGER DEFAULT -1 NOT NULL);
CREATE INDEX moz_places_faviconindex ON moz_places (favicon_id);
CREATE INDEX moz_places_frecencyindex ON moz_places (frecency);
CREATE INDEX moz_places_hostindex ON moz_places (rev_host);
CREATE UNIQUE INDEX moz_places_url_uniqueindex ON moz_places (url);
CREATE INDEX moz_places_visitcount ON moz_places (visit_count);

And for the moz_historyvisists:

sqlite3 places.sqlite ".schema moz_historyvisits"
CREATE TABLE moz_historyvisits (
id INTEGER PRIMARY KEY,
from_visit INTEGER,
place_id INTEGER,
visit_date INTEGER,
 visit_type INTEGER,
session INTEGER);
CREATE INDEX moz_historyvisits_dateindex ON moz_historyvisits (visit_date);
CREATE INDEX moz_historyvisits_fromindex ON moz_historyvisits (from_visit);
CREATE INDEX moz_historyvisits_placedateindex ON moz_historyvisits (place_id, visit_date);
CREATE TRIGGER moz_historyvisits_afterdelete_v1_trigger
AFTER DELETE ON moz_historyvisits FOR EACH ROW
WHEN OLD.visit_type NOT IN (0,4,7)
BEGIN UPDATE moz_places SET visit_count = visit_count - 1
WHERE moz_places.id = OLD.place_id
AND visit_count > 0; END;
CREATE TRIGGER moz_historyvisits_afterinsert_v1_trigger
 AFTER INSERT ON moz_historyvisits FOR EACH ROW
WHEN NEW.visit_type NOT IN (0,4,7)
BEGIN UPDATE moz_places SET visit_count = visit_count + 1
WHERE moz_places.id = NEW.place_id; END;

According to the FirefoxForensics.com web site the explanation of each of keys found in the moz_places is roughly the following:

  • id INTEGER PRIMARY KEY, an integer that indicates the primary key for the database, of no real interest
  • url LONGVARCHAR, the URL that has been visited and the protocol used, something that one likes to examine.
  • title LONGVARCHAR, the title of the page as it appears in the browser
  • rev_host LONGVARCHAR, the reverse of the host name that was visited. used to ease searching and querying into hosts visited in history file.
  • visit_count INTEGER DEFAULT 0, as the variable implies a counter for the site
  • hidden INTEGER DEFAULT 0 NOT NULL,either 0 or 1. if the URL is hidden then the user did not navigate directly to it, usually indicates an embedded page using something like an iframe
  • typed INTEGER DEFAULT 0 NOT NULL,indicates whether the user typed the URL directly into the location bar
  • favicon_id INTEGER,relationship to another table containing favicon
  • frecency INTEGER DEFAULT -1 NOT NULL, combination of frequency and recency, used to calculate which sites appear at the top of the suggestion list when URL’s are typed in the address bar.

Since we know the structure and the meaning of each key in the database it is simple to create a script that can parse the database and display the content as you wish. The Perl script I wrote is called ff3histview and can be found here. It reads the places.sqlite database and displays the content of it in a human readable format. The usage of the script is the following:

ff3histview [--help|-?|-help]
This screen
ff3histview [-t TIME] [-csv|-txt|-html] [-s|--show-hidden] [-o|--only-typed] [-quiet]    places.sqlite
     -t Defines a time scew if the places.sqlite was placed on a computer with a wrong time settings.  The format of the
        variable TIME is: X | Xs | Xm | Xh
       where X is a integer and s represents seconds, m minutes and h hours (default behaviour is seconds)
     -quiet Does not ask questions about case number and reference (default with CSV output)
     -csv|-txt|-html The output of the file.  TXT is the default behaviour and is chosen if none of the others is chosen
     -s or --show-hidden displays the "hidden" URLs as well as others.  These URL's represent URLs that the user
       did not specifically navigate to.
     -o or --only-typed Only show URLs that the user typed directly into the location/URL bar.

     places.sqlite is the SQLITE database that contains the web history in Firefox 3.  It should be located at:
           [win xp] c:\Documents and Settings\USER\Application Data\Mozilla\Firefox\Profiles\PROFILE\places.sqlite
           [linux] /home/USER/.mozilla/firefox/PROFILE/places.sqlite
           [mac os x] /Users/USER/Library/Application Support/Firefox/Profiles/PROFILE/places.sqlite

The default behaviour of the script is to extract all URL’s from moz_places that are not hidden (can be changed using parameters to the script, as well as to only see user typed in URLs). A hidden URL, according to firefoxforensics.com, is a URL “that the use did not specifically navigate to. These are comonly embedded pages, i-frames, RSS bookmarks and javascript calls.”[1] So the SQL statement that the script executes is:

SELECT moz_historyvisits.id,url,title,visit_count,visit_date,from_visit,rev_host
FROM moz_places, moz_historyvisits
WHERE
 moz_places.id = moz_historyvisits.place_id
 AND hidden = 0

In fact a really simple SQL statement, just to extract the URLs and some other information of value. The “from_visit” that is extracted refers to a URL that the user navigated from. Relevant information about those nodes are extracted as well, giving the investigator more information about context.

The script can output both in text format as well as CSV (Comma Seperated Value) for spreadsheet manipulation as well as a HTML page. Examples of usage (standard default behavior, without asking questions about case details) :

ff3histview -q  places.sqlite
Firefox 3 History Viewer
Not showing 'hidden' URLS, that is URLs that the user did not specifically navigate to, use -s to show them
Date of run (GMT): 13:7:9, Thu Jul 2, 2009
Time offset of history file: 0 s
-------------------------------------------------------
Date                Count    Host name    URL    notes
Thu Jun 25 09:16:17 2009    1    www.regripper.net    http://www.regripper.net/
Wed Jun 24 20:19:53 2009    1    isc.sans.org    http://isc.sans.org/
Thu Jun 25 12:43:14 2009    1    snort.org    http://snort.org/
Thu Jun 25 12:43:09 2009    2    www.snort.org    http://www.snort.org/
Thu Jun 25 12:53:51 2009    2    www.snort.org    http://www.snort.org/    From: http://www.snort.org/news
Thu Jun 25 12:48:09 2009    1    www.snort.org    http://www.snort.org/news    From: http://www.snort.org/
Thu Jun 25 18:38:35 2009    1    www.groklaw.net    http://www.groklaw.net/    From:
-------------------------------------------------------

And if there is any discrepancies in the time settings of the investigator’s machine and the suspect’s one:

ff3histview -q -t 14380  places.sqlite
Firefox 3 History Viewer
Not showing 'hidden' URLS, that is URLs that the user did not specifically navigate to, use -s to show them
Date of run (GMT): 13:8:34, Thu Jul 2, 2009
Time offset of history file: 14380 s
-------------------------------------------------------
Date                Count    Host name    URL    notes
Thu Jun 25 13:15:57 2009    1    www.regripper.net    http://www.regripper.net/
Thu Jun 25 00:19:33 2009    1    isc.sans.org    http://isc.sans.org/
Thu Jun 25 16:42:54 2009    1    snort.org    http://snort.org/
Thu Jun 25 16:42:49 2009    2    www.snort.org    http://www.snort.org/
Thu Jun 25 16:53:31 2009    2    www.snort.org    http://www.snort.org/    From: http://www.snort.org/news
Thu Jun 25 16:47:49 2009    1    www.snort.org    http://www.snort.org/news    From: http://www.snort.org/
Thu Jun 25 22:38:15 2009    1    www.groklaw.net    http://www.groklaw.net/    From:
-------------------------------------------------------

And if we want to include “hidden” URL’s:

ff3histview -q -t 14380 -s places.sqlite
Firefox 3 History Viewer
Showing hidden URLs as well

Date of run (GMT): 13:14:58, Thu Jul 2, 2009
Time offset of history file: 14380 s

-------------------------------------------------------
Date                Count    Host name    URL    notes
Thu Jun 25 13:15:57 2009    1    www.regripper.net    http://www.regripper.net/
Thu Jun 25 00:19:33 2009    1    isc.sans.org    http://isc.sans.org/
Thu Jun 25 00:19:36 2009    0    www.sans.org    http://www.sans.org/banners/isc.php    From: http://isc.sans.org/
Thu Jun 25 06:14:16 2009    0    www.sans.org    http://www.sans.org/banners/isc.php    From: http://isc.sans.org/
Thu Jun 25 15:01:07 2009    0    www.sans.org    http://www.sans.org/banners/isc.php    From: http://isc.sans.org/
Thu Jun 25 06:19:18 2009    0    www.sans.org    http://www.sans.org/banners/isc_ss.php    From:
Thu Jun 25 15:58:20 2009    0    www.sans.org    http://www.sans.org/banners/isc_ss.php    From:
Thu Jun 25 13:15:57 2009    0    www.regripper.net    http://www.regripper.net/links.htm    From: http://www.regripper.net/
Thu Jun 25 13:15:57 2009    0    www.regripper.net    http://www.regripper.net/main.htm    From: http://www.regripper.net/
Thu Jun 25 13:16:13 2009    0    www.regripper.net    http://www.regripper.net/RegRipper/    From: http://www.regripper.net/links.htm
Thu Jun 25 13:16:17 2009    0    www.regripper.net    http://www.regripper.net/RegRipper/RegRipper/    From: http://www.regripper.net/RegRipper/
Thu Jun 25 13:16:28 2009    0    regripper.invisionplus.net    http://regripper.invisionplus.net/    From: http://www.regripper.net/links.htm
Thu Jun 25 16:42:54 2009    1    snort.org    http://snort.org/    From: http://www.regripper.net/links.htm
Thu Jun 25 16:42:49 2009    2    www.snort.org    http://www.snort.org/    From: http://www.regripper.net/links.htm
Thu Jun 25 16:53:31 2009    2    www.snort.org    http://www.snort.org/    From: http://www.snort.org/news
Thu Jun 25 16:47:49 2009    1    www.snort.org    http://www.snort.org/news    From: http://www.snort.org/
Thu Jun 25 22:38:15 2009    1    www.groklaw.net    http://www.groklaw.net/    From:
-------------------------------------------------------

And for the HTML output:

ff3histview  -t 14380 --html places.sqlite
History HTML

Squid Timeline analysis

June 24th, 2009 No comments

Sometimes it can be useful to know at what time a malware starts communicating to the outside world, and often it is done through HTTP or HTTPS.  So it can be quite useful to examine network log files to determine the initial time that the malware started to communicate to the C&C.

One method in doing so would be to use the tool mactime from TSK to read Squid access log, you only need to modify the access log so it is contained in a bodyfile.  So I wrote the script squid2timeline that achieves that. The usage of the script is:

squid2timeline -c CONFIG [-l] [-h HOST] [ACCESS]
 Where CONFIG refers to the configuration file of squid, usually /etc/squid/squid.conf
 The script then reads the variables needed to determine the correct format of the squid
 access file and the location of the current squid access file.
 Optional: ACCESS defines the access file to read, otherwise the current one as it is
 defined in the squid.conf file will be read.

squid2timeline [-l] [-h HOST] [-e] ACCESS
 -e Indicates that the access file is constructed using emulate_h t t p d_log on
 Otherwise (the default behaviour) emulate_h t t p d_log will be assumed to be off

 [-l] Defines a legacy timeline format as used by TSK version 1.X and 2.X,
 otherwise version 3.0+ is assumed.

 [-h HOST] defines a host name to be included in the timeline.

So one example of the usage of this script is to map the timeline of one individual IP address that is infected, or suspected of being infected, from the access log file and run it through the script and mactime.

grep 10.1.1.1 access.log.1 > access.log_10.1
squid2timeline access.log_10.1 > body
mactime -b body -i hour summary -d > timeline.csv

The content of the file “timeline.csv” would then be a timeline in a CSV format and the file “summary” contains an hourly summary of the traffic.  If we examine the content of the summary file it looks like this:

Hourly Summary for Timeline of body

Mon Jun 22 2009 04:00:00, 835
Mon Jun 22 2009 05:00:00, 945
Mon Jun 22 2009 06:00:00, 807
Mon Jun 22 2009 07:00:00, 814
Mon Jun 22 2009 08:00:00, 810
Mon Jun 22 2009 09:00:00, 804
Mon Jun 22 2009 10:00:00, 879
Mon Jun 22 2009 11:00:00, 1680
Mon Jun 22 2009 12:00:00, 1789
Mon Jun 22 2009 13:00:00, 1023
....

So a unusual spike appears in the traffic around 11:00, something that could be an indication of an infection.  This than can assist the analyst to focus the investigation on that timeline.

Categories: Forensics, Network Analysis Tags:

Office 2007 metadata

June 12th, 2009 1 comment

Metadata information from documents can be a great source of information for investigators.  And I’ve often come across documents created in Microsoft Word or other Office documents.  There are several scripts and tools to read the properitary binary office 2003 and earlier format that Microsoft created and I’ve got nothing to add to those tools.  But I couldn’t find any tools that listed the metadata information from Office documents created using Office 2007, which use the OpenXML document format.  So I decided to examine it a bit further.

Microsoft has published a good document describing the structure of OpenXML, for instance here. Essentially a document created in the OpenXML document format is a compressed file, using the well known ZIP format.  Inside the ZIP file are predefined structures of files, mostly XML files that describe the document and it’s content.  So it can be easily read using standard available libraries in scripting languages such as Perl.

According to Microsoft a folder is created inside the ZIP archive called “_reis”.  This folder contains a file named “.rels” that defines the root relationships within the package.  This should be the first place to be able to parse the content of the document.  Whitin the .res file you find tags that define the relationship of the document:

<Relationship Id="someID" Type="relationshipType" Target="targetPart"/>

Metadata is stored in files that contain a type of “*properties”, most notable the “core-properties” and “extended-properties”. These files are usually stored in the following location:

  • docProps/core.xml
  • docProps/app.xml

These files then contain the actual metadata information, such as document creator, last saved by information, etc. These files then need to be extracted and parsed to display the metadata information.

To do this I wrote the script read_open_xml.pl that parses the contents of the .rels file to locate metadata information from the document and then extracts the metadata and prints it to the screen. Example of the usages is:

./read_open_xml.pl test.docx
==========================================================================
 cmd line: ./read_open_xml.pl test.docx
==========================================================================

Document name: test.docx
Date: Tue Jun  9 16:51:23 GMT 2009

--------------------------------------------------------------------------
File Metadata
--------------------------------------------------------------------------
 title = my company template
 subject = Document template
 creator = Kristinn Gudjonsson
 keywords = template, word
 description =
 lastModifiedBy = Kristinn Gudjonsson
 revision = 3
 lastPrinted = 2008-08-15T10:14:00Z
 created = 2008-08-15T10:14:00Z
 modified = 2008-08-15T10:14:00Z
 category = template
--------------------------------------------------------------------------
Application Metadata
--------------------------------------------------------------------------
 Template = my_template.dot
 TotalTime = 0
 Pages = 2
 Words = 159
 Characters = 908
 Application = Microsoft Word 12.1.2
 DocSecurity = 0
 Lines = 7
 Paragraphs = 1
 ScaleCrop = false
 Manager = Some dude
 Company = My Company
 LinksUpToDate = false
 CharactersWithSpaces = 1115
 SharedDoc = false
 HyperlinksChanged = false
 AppVersion = 12.0258

copyright, Kristinn Gudjonsson, 2009

The script also reads the character encoding of the XML documents and encodes the output accordingly.  If you experience any problems using the script, please notify me so I can fix the problem, but so far I haven’t come across any openXML document that hasn’t been correcly parsed using this script.

Update 1

I’ve modified the script slightly so it can be used in Windows.  I’ve tested the script on a Win XP SP3 machine using ActivePerl 5.10 and it should work.  You can get the Windows version here.
Categories: Forensics Tags:

Read a unicode file

June 9th, 2009 No comments

In a recent case I came across a machine that was infected with malware.  The machine had the free AVG antivirus installed.  AVG keeps their log at “C:\Documents and Settings\All Users\Application Data\avg8\Log”.  Under that folder are several log files, all identified by “file” as “MPEG ADTS, layer I, v1, 160 kbits, 32 kHz, Stereo”.  This is obviously not true, so I took a short look at one of the log files:

cat avgcore.log | xxd | head -10
0000000: fffe 5b00 4100 5600 4700 3800 2e00 4300  ..[.A.V.G.8...C.
0000010: 6f00 7200 6500 5d00 2000 4900 4e00 4600  o.r.e.]. .I.N.F.
0000020: 4f00 2000 3200 3000 3000 3800 2d00 3100  O. .2.0.0.8.-.1.
0000030: 3200 2d00 3000 3400 2000 3200 3000 3a00  2.-.0.4. .2.0.:.
0000040: 3200 3700 3a00 3100 3500 2c00 3000 3300  2.7.:.1.5.,.0.3.
...

As can be seen in the above output the file is written in Unicode, although the language is in English and therefore we could read the file using the ASCII table.  So I wrote a quick Perl script to read the file for me, which can be seen here.

The usage of the script is:

    read_unicode [-l] [-h] [-o OFFSET] FILE
Where:
        -l Preceed each printed line with a line number
        -h Print this help message
        -o OFFSET Defines the offset where the script starts reading the unicode text.
        This option can be used to skip a file header and read the content of the file.
        FILE this is the file in Unicode that is to be read by the script

So to read the log file in question, I could simply use

read_unicode avgcore.log
??[AVG8.Core] INFO 2008-12-04 20:27:15,031 XXX-F0C226 PID:528 THID:2772
ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16'
[AVG8.Core] INFO 2008-12-04 20:27:15,265 XXX-F0C226 PID:528 THID:2772
ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16'
...

Or to skip the file header

read_unicode -o 2 avgcore.log
[AVG8.Core] INFO 2008-12-04 20:27:15,031 XXX-F0C226 PID:528 THID:2772
ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16'
[AVG8.Core] INFO 2008-12-04 20:27:15,265 XXX-F0C226 PID:528 THID:2772
ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16'
...

Just a simple Perl script that does the job for me, at least for this case.

Categories: Forensics Tags:

Windows Prefetch Directory

June 8th, 2009 2 comments

The Prefetch folder in Windows contains information about last run software on a Windows machine.  It can be very valuable to examine the content of the prefetch directory (can be found at %WINDIR%/Prefetch, usually either C:\WINDOWS\Prefetch or C:\WINNT\Prefetch) to find clues about which software has been recently run on the system.

To be able to use this script that I wrote, you need to first mount the Windows image file (see previous post from me on how-to mount a NTFS volume in Linux).  Then you can run the script, that can be found here, like this:

read_prefetch /mnt/analyze/WINNT/Prefetch

Or you can create a HTML report like this

read_prefetch -h /tmp/report.html /mnt/analyze/WINNT/Prefetch

An example report can be seen here:

Example report

Example report

Categories: Forensics, Windows Analysis Tags:
-->