Many parts of Facebook such as chat, messaging and posting statuses are written in Javascript/AJAX. This requires a lot of calls to the server to constantly have the most up-to-date information. To speed things up, Facebook stores some of the AJAX data in temporary files on the person's computer. These files can contain valuable forensic data. In particular, Facebook stores some chat messages in individual files. I say some because caching may not be invoked if the person does not move away from the Facebook homepage at all. However any movement to other Facebook pages should cause the caching.
How to extract Internet Explorer Facebook chat messages using EnCase can be found here and JADsoftware has a tool that finds all browser chat messages (free version is limited to 10 messages). Following the EnCase guide I managed to make my own script that finds all Facebook chat messages sent and recieved via Internet Explorer. I'm not sure how Firefox caching works and cannot find any references to Facebook chat yet, but if I do I shall update the script.
Basically, the script looks inside every folder in the temporary internet files folder for files that are named something like p_[numbers][characters/numbers].txt or .htm, such as p_61003300=10[1].txt or p_61003300=0CAX1XPU0.txt. Each file that matches is then checked to see if it is in the correct Facebook chat format. If so, the useful parts are extracted and added to a CSV file. The file is also copied over to a results folder.
To use, just change the first line at the top to point to the temporary internet files folder you want to investigate. You can find out where yours is by Googling 'location of temporary internet files folder in <operating system>'
import os import re import shutil import csv from datetime import datetime # change this to where the temporary internet files are kept temp_internet_files_loc = r"c:\Users\Sarah\AppData\Local\Microsoft\Windows\ Temporary Internet Files" # regular expressions to make facebook chat files facebook_reg = re.compile(r'\Ap\_(\d)+(\S)*\.(txt|htm)\Z') message_ref = re.compile(r'\Afor \(\;\;\)\;{"t":"msg","c":"\S+","s":\d+, "ms":\[\{"msg":\{"text":"(.*?)","time":(\d+),"clientTime":(\d+),"msgID": "(\d+)"\},"from":(\d+),"to":(\d+),"from_name":"(.*?)","from_first_name": "(\S+)","type":"msg","fl":\S+,"to_name":"(.*?)","to_first_name":"(\S+)" \}\]\}\Z') total_found = 0 results = None def check_format(file): """ Opens the file given and checks it matches the Facebook chat regular expression. If so, returns the Facebook chat fields. """ with open(file, 'r') as f: line = f.read() result = message_ref.match(line) if result is not None: message, time, clientTime, msgID, from_, to, from_name, \ from_first_name, to_name, to_first_name = \ result.group(1,2,3,4,5,6,7,8,9,10) return message, time, clientTime, from_, to, from_name, to_name else: return None if __name__ == "__main__": # get the full path of the folder results will be stored in dest = raw_input("\nPath of folder to write results to: ") if os.path.exists(dest): # open a CSV file and write Facebook chat headers results = csv.writer(open(os.path.join(dest, 'facebook_chat_messages.csv'), 'wb'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL) results.writerow(['UTC Time', 'UTC Formatted Time', 'Users Time', 'Users Formatted Time', 'From FB ID', 'From Name', 'To FB ID', 'To Name', 'Message Sent']) # walk through the temporary internet files folder and look at each file for root, dirs, files in os.walk(temp_internet_files_loc): for file in files: # if it matches the facebook file name format: if facebook_reg.match(file): src = os.path.join(root, file) msg_tuple = check_format(src) # check the file is a facebook chat file if msg_tuple is not None: # format date/times correctly client_time = datetime.fromtimestamp(float(msg_tuple[2])/1000.00)\ .strftime("%d/%m/%Y %H:%M:%S") time = datetime.fromtimestamp(float(msg_tuple[1])/1000.00)\ .strftime("%d/%m/%Y %H:%M:%S") # write a row to the CSV file results.writerow([float(msg_tuple[2])/1000.00, time, float(msg_tuple[2])/1000.00, client_time, msg_tuple[3], msg_tuple[5], msg_tuple[4], msg_tuple[6], msg_tuple[0]]) dst = os.path.join(dest, file) # copy the original Facebook file to results folder shutil.copy(src, dst) total_found = total_found + 1 print "\nTotal facebook chat messages found: %s\n" % total_found print r"Messages found and a summary CSV file added to: %s" % dest else: print "\nInvalid folder! Exiting."
Nice script and thx for sharing. But this is JSON we are talking about. What happens when the format changes? (I have personally seen about 10 different ways of storing the format.) Im not very strong within python, but from I can see, the script only takes one if the many different formats. Am i right?
As far as I know this is how FB stores the info. The script relies on FB keeping the format the same, but if it changes then the regular expression can be updated.
How difficult would this be, to modify for Firefox/Chrome? Or is there anything similar for Chrome/Firefox/Safari?