6 years ago (yikes!) I wrote about image steganography as a concept. At the moment there are a couple of pieces of malware that use steganography, such as Vawtrak (aka Neverquest) and ZeuS, to hide the command and control servers (C&C) or configuration files in images. This means that the malware does not need to contain a static list of C&Cs which will become old quickly, but can just download an innocent looking image from the internet; decode the hidden message and then connect out. The advantages are that the image can be refreshed with C&C data without having to recompile the malware; and the images can be hidden in plain sight; e.g. on legitimate message boards.
In my second to last post I alluded to a talk I did at the CyberForensics conference. You can access the presentation here http://lowmanio.co.uk/share/OpenSourceForensicCaseManagementSlides.pdf
QR codes seem to be popping up everywhere now, from adverts & marketing campaigns to tracking and tickets. It’s easy to see why; they are easy to generate, have a high level of error-correction and the ability to encode quite a lot of data (the maximum being 4,296 alpha-numeric characters). Since most modern smart phones have the ability to decode them they present a much simpler way for people to get their information across – for example its much quicker and easier to scan a code on a house’s for-sale sign which links directly to the exact URL for that property, than to print the URL on the sign (for one it probably won’t fit, and secondly it’ll be the estate agents homepage and not the specific URL).
Have you ever wondered how some malware variants are able to delete themselves? A malicious executable is launched on a machine, and once launched in memory, the executable vanishes. This makes malware analysis very hard if no memory dump was taken, as there is seemingly nothing there. We can however use other artefacts to confirm the running of a non-existent file, such as by looking at the Prefetch files, User Assist files and certain registry entries.
In the context of investigations and forensics, “open source intelligence” is information collected from publicly available sources, such as newspapers and the internet. In a commercial forensics environment you may be asked to work out who is behind a certain anonymous identity; for example they might be posting secret company information on a blog or defaming the company’s reputation on a website forum. There are lots of free ways to help figure out people’s identities or gather more information about them.
I recently discovered a wonderful unicode character that makes the following text reverse called right-to-left-override. For example: print "Hello[U+202E]World", produces the output: Hello dlroW. I'm not sure of what legitimate reason you would use the unicode character, but several blogs have warned that it can be used by malware writers to get people to click on files. Most people are wary that .exe files might be harmful, but extensions like JPG and other images are generally not. You can 'trick' a user into thinking a file is a JPG by using this special unicode character. If you named your malware executable ClickHer[U+202E]gpj.exe for example, you'd end up with a file called ClickHerexe.jpg.
Proxy logs need a bit of work done to them before you can start analysing the content. This is of course assuming you don't have a fancy product to do all this work for you ;). First, you need to work out the regular expression that defines a line in the proxy log to parse it into a nicer format such as CSV. A lot of the CSV columns can probably be removed; the most useful columns are URL, date & time, user agent string (to work out what browser the user was using for example) and request status code (to work out if the user was able to access the content or if it was blocked, unavailable etc).
When you look at your recycle bin folder, Windows shows you all the files you’ve deleted in a user friendly format – i.e. the name of the file you originally deleted and when it was deleted. The operating system does quite a bit of work for you, as the actual files within your recycle bin are not quite as user friendly. The recycle bin in Windows 7 is located at:
One of the most important parts of digital forensics is working out when things happened. When did a file get last accessed or modified? When did a user access this website? What was happened yesterday at 4.30PM? This would be very easy if the entire world was based in UTC, or at least all operating systems and log files stored time in UTC in the same format. Instead, we have various mixtures of UTC and local time, stored in Windows time format (100 nanosecond intervals since Jan 1st 1601) or Unix epoch format (seconds since Jan 1st 1970), a plain string format or however each programming language decides to encode time. This is especially important when doing forensics for global companies where the investigation can be carried out on several computers spanning different timezones, and the investigator is in a different timezone too. Establishing a common timezone is imperative, so not to get lost with local times and correlating evidence. Even on the same machine this is difficult - the Windows registry is in UTC, but setupapi.log and other important log file are in localtime.
Many parts of Facebook such as chat, messaging and posting statuses are written in Javascript/AJAX. This requires a lot of calls to the server to constantly have the most up-to-date information. To speed things up, Facebook stores some of the AJAX data in temporary files on the person's computer. These files can contain valuable forensic data. In particular, Facebook stores some chat messages in individual files. I say some because caching may not be invoked if the person does not move away from the Facebook homepage at all. However any movement to other Facebook pages should cause the caching.
I've finally finished the first draft of my thesis, I now have a week and a few days to edit and finish it- which is plenty of time since I'm fairly happy with it as it stands.
Some index.dat files record not only websites visited, but also the files on the computer (and any other devices) which have been opened. This gives an accurate account of what files have been viewed and possibly edited. Using the registry, any files accessed that are not on the C: drive can be linked to a USB stick / DVD / CD etc.
I've nearly finished Webscavator, my visualisation application for the forensic analysis of user web history! The next series of blog posts will describe some of the visualisations I've used and how to code them. They are all written in server-side Python and client-side Javascript using jQuery. First on the list are heatmaps. These visualisations show the data using colour. For example low values go blue and higher values go red to visualise a temperature scale. A couple of examples can be found at heatmapapi.com, Google Visualization API and GraphUp. I found the Google Visualization API too limiting to work with for this particular visualisation, and GraphUp is not free or open source, so I made my own, described below.
In today’s forensic science theory lectures we got taught that not only is DNA not unique, but there is an actual chance of two people having the same DNA profile. The lecturer first explained the birthday paradox, and then tried to explain it with DNA and got me terribly confused with what numbers go where in what equations. So I’ve read up on it and will now try and explain the birthday paradox and why there are potentially thousands of doppelgangers in the world.
Steganography is the art of hiding something in something else in plain sight. Usually images or text are hidden within other images or sound files. For example, in the image below of trees there is an image of a cat hidden inside it. Wikipedia explains that for each component of each RGB value, if you take just the last 2 bits of it and then turn the brightness up 85%, you get a picture of the cat. The whole point is so the image of the trees looks identical to an image of the trees without an image hidden inside to the human eye.
I made this script a while back now to populate a database with all the music on my computer (so excuse any poor Python!). It assumes you are on Windows and have all your music in one folder, arranged by artist with sub folder albums with songs in them. It also assumes you'll use PostgreSQL, but it's trivial to change this to mySQL or even SQLite which comes with Python2.5 or higher. You can tell it to ignore certain folders by adding to the ignorables set. It will automatically grab any album art it finds and try and get the genre, track number and composer etc from MP3 meta data (I couldn't find a way of doing any other kind of music type).
You may have noticed I have added a search bar at the top of the website. Here is how to make use of PostgreSQL's full text search facility with SQLAlchemy, a Python SQL Toolkit and Object Relational Mapper.
Once the script to generate captchas is set up (see previous post) this can be easily tied into a Python web page. This assumes you are using Werkzeug and Mako, but I'm sure Django/Pylons with Jinja etc won't be too different.
I have redesigned lowmanio.co.uk to be more bloggy, and hopefully I shall actually keep up with my blog this time. I got a little over enthusiastic with blog posts as you can see...I've already blogged 6 times! The topics I will blog about are in the categories to your right. I am starting a masters in Forensics Informatics in September, so will post anything interesting I learn here. I'm also really interested in Human-Computer Interaction (HCI) and general art and design, and computers in general! Hobbies wise, I own two rabbits called Pixel and Nybble and I love to cook. Both of these will get mentions - hopefully not at the same time (although rabbit meat is delicious). I have also moved my bunny blog which used to live at http://rabbits.lowmanio.co.uk to here.
Creating the blog tags for this website was a bit tricky because I wasn't sure how to make the tags have different sizes according to their significance. I started off with 5 spans and ordered the tags in terms of frequency and divided them equally into the spans. However tags are not evenly distributed, so instead I calculated the normalized weight of the tag according to the others, and made the font size a percentage of that.
In making this website and in my 4th year honours project I implemented a captcha (which you can see if you try and make a comment). I thought this would be a bit of a nightmare to do, but with Steven's help and the awesomeness of Python, it was quite easy. The code originally comes from here, but I have made a few edits such as keeping the image in memory.
sophisticated cyber attack moving flat Deterrence Theory court report treats privacy timestamps Christmas nudge theory Moo greeting crafts CSS internet history text Women in Technology Mini forensics challenge answers birthday Strathclyde tags batch files Megalosaurus file headers SMS Highland cow PostgreSQL gig Itiel Dror metaphors compSIA. Security+ Routine Activities Theory microsoft edge GDPR conference mobile phones thumbs.db New Scientist Skye rabbit nutrition self deletion Opera abandoned buildings Registry data privacy magic snob Gullane Firefox RHD Linkin Park alternate data streams Internet Explorer chicken Michelin restaurant Safari heatmap Windows 7 Post Secret search terms