Read a unicode file
In a recent case I came across a machine that was infected with malware. The machine had the free AVG antivirus installed. AVG keeps their log at “C:\Documents and Settings\All Users\Application Data\avg8\Log”. Under that folder are several log files, all identified by “file” as “MPEG ADTS, layer I, v1, 160 kbits, 32 kHz, Stereo”. This is obviously not true, so I took a short look at one of the log files:
cat avgcore.log | xxd | head -10
0000000: fffe 5b00 4100 5600 4700 3800 2e00 4300 ..[.A.V.G.8...C.
0000010: 6f00 7200 6500 5d00 2000 4900 4e00 4600 o.r.e.]. .I.N.F.
0000020: 4f00 2000 3200 3000 3000 3800 2d00 3100 O. .2.0.0.8.-.1.
0000030: 3200 2d00 3000 3400 2000 3200 3000 3a00 2.-.0.4. .2.0.:.
0000040: 3200 3700 3a00 3100 3500 2c00 3000 3300 2.7.:.1.5.,.0.3.
...
As can be seen in the above output the file is written in Unicode, although the language is in English and therefore we could read the file using the ASCII table. So I wrote a quick Perl script to read the file for me, which can be seen here.
The usage of the script is:
read_unicode [-l] [-h] [-o OFFSET] FILE
Where:
-l Preceed each printed line with a line number
-h Print this help message
-o OFFSET Defines the offset where the script starts reading the unicode text.
This option can be used to skip a file header and read the content of the file.
FILE this is the file in Unicode that is to be read by the script
So to read the log file in question, I could simply use
read_unicode avgcore.log
??[AVG8.Core] INFO 2008-12-04 20:27:15,031 XXX-F0C226 PID:528 THID:2772 ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16' [AVG8.Core] INFO 2008-12-04 20:27:15,265 XXX-F0C226 PID:528 THID:2772 ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16' ...
Or to skip the file header
read_unicode -o 2 avgcore.log
[AVG8.Core] INFO 2008-12-04 20:27:15,031 XXX-F0C226 PID:528 THID:2772 ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16' [AVG8.Core] INFO 2008-12-04 20:27:15,265 XXX-F0C226 PID:528 THID:2772 ID:XX-XX-XX-XX-XX:YY.YY.YY MSG:'ERRORCODE:0x0', 'SIZE:0x16', 'SIZE:0x16' ...
Just a simple Perl script that does the job for me, at least for this case.