Data Exfiltration
Similar to copied files, one of the questions that responders and analysts get from time to time is, how do I tell what data was copied off of a system, or exfiltrated from the network?
This is not an easy question to answer, and often depends upon a number of variables. To make things a bit simpler, let's look at a couple of scenarios:
1. You show up on-site, are handed a hard drive or an image, and asked to tell if data (either any data, or specific data) was copied off of the system.
In this case, as the responder/analyst, you've got a couple of options. First, try to determine what the customer envisions with respect to how the data may have been taken. Was it copied to a USB removable storage device? Was it accessed by a remote intruder (via a backdoor or bot), or perhaps sent off of the system by the user, via web-based email or FTP? All of these may give you clues on where to start looking for some kind of logs. In most cases, you may not find any. However, I've found indications of partial conversations and even attachments as part of web-based email. Also, when an incident has involved remote unauthorized shell-level access, I've found indications with Registry hive files of activity by the intruder accesses or searching for files.
You might also try to determine what other devices or monitoring applications may have logs available that might be of use. Firewall and router ACL logs, for instance, may give you indications of untoward outbound connections. In the absence of those logs, you could hope that someone captured some network traffic.
A note on logs: Some logs may include data about an outbound connection, to include the total number of packets transferred. While this might be an indicator that something may have been sent over the network connection, you will still be unable to determine what actual content was sent from simply the packet size alone. True, traffic analysis might give you an indication, particularly when combined with testing of malware (if malware was the root of the issue) in a virtual environment, but the fact of the matter is that knowing that three guys just ran by your house doesn't tell you the color of their shirts, or their names.
If the issue in question was thought to be associated with malware found on a system, running that malware in a virtual environment may provide indications of specific files or content that the malware was looking for, but this may not always be the case. Sometimes, malware (bots, for instance) won't be programmed to look for files or content themselves, but can be commanded to do so via the C&C server, and your analysis may be predicated on a particular time window, or having access to the Internet so that the malware you're testing can connect to the C&C server.
2. You arrive on-site after a worm infection has been eradicated, and asked to determine what, if any data had been exfiltrated from the network.
In this case, the systems in question may still be live, but the malware may have been eradicated. Under such circumstances, you're not going to have processes to look at - running processes maintain a list of handle objects, and you may be able to get an idea of the file handles that a process has open. However, beyond that, this is scenario leaves you in much the same position as scenario #1.
3. You arrive on-site, and there are systems that are infected still running, that haven't been taken off-line or cleaned, and are still generating network traffic. At this point, what data do you need to determine data exfiltration?
The best source of information for determining data exfiltration at this point is going to be network traffic captures. At this point, you can see the data itself, and using tools like Wireshark, reconstruct entire network conversations and see what data was transmitted, and from which system. You can then use words and phrases that appear to be unique to do keyword searches on the systems themselves for the actual content that was accessed.
Another good source (albeit not the best) of information will be from the hosts themselves. Using either live response tools such as handle.exe, or dumping the contents of physical memory and using Volatility to parse out the list of open file handles from each process, you will be able to see which files the process in question had open. This will give you yet another data point in determining not only data exfiltration, but also perhaps any targeted nature of malware.
To sum up, in an ideal world, when faced with the question of data exfiltration via the network, the responder should be looking for a live system with an active process, and network packet captures. For other incidents where the exfiltration mechanism isn't know, non-digital sources such as CCTV, etc., might prove to be valuable.
A word about preparedness - F-Response. Yep, that's it. Data leaving your network, particularly when you don't want it to, is a very bad thing. Determining the cause, what data is being taken, where it's going, etc. - all the important questions that need to be answered - are all questions that require a quick and accurate response to answer. Even the fastest incident responders will take time to arrive on-site, whereas if you have F-Response Enterprise Edition already installed, the response time is reduced to however long it takes a responder to log in and access the central appliance. The advantage of tools like F-Response are a faster, greener response that doesn't replace having someone come on-site, but instead augments it. This all assumes, of course, that you're response plan isn't to run AV scans, delete stuff, take systems off-line, and then call someone...
This is not an easy question to answer, and often depends upon a number of variables. To make things a bit simpler, let's look at a couple of scenarios:
1. You show up on-site, are handed a hard drive or an image, and asked to tell if data (either any data, or specific data) was copied off of the system.
In this case, as the responder/analyst, you've got a couple of options. First, try to determine what the customer envisions with respect to how the data may have been taken. Was it copied to a USB removable storage device? Was it accessed by a remote intruder (via a backdoor or bot), or perhaps sent off of the system by the user, via web-based email or FTP? All of these may give you clues on where to start looking for some kind of logs. In most cases, you may not find any. However, I've found indications of partial conversations and even attachments as part of web-based email. Also, when an incident has involved remote unauthorized shell-level access, I've found indications with Registry hive files of activity by the intruder accesses or searching for files.
You might also try to determine what other devices or monitoring applications may have logs available that might be of use. Firewall and router ACL logs, for instance, may give you indications of untoward outbound connections. In the absence of those logs, you could hope that someone captured some network traffic.
A note on logs: Some logs may include data about an outbound connection, to include the total number of packets transferred. While this might be an indicator that something may have been sent over the network connection, you will still be unable to determine what actual content was sent from simply the packet size alone. True, traffic analysis might give you an indication, particularly when combined with testing of malware (if malware was the root of the issue) in a virtual environment, but the fact of the matter is that knowing that three guys just ran by your house doesn't tell you the color of their shirts, or their names.
If the issue in question was thought to be associated with malware found on a system, running that malware in a virtual environment may provide indications of specific files or content that the malware was looking for, but this may not always be the case. Sometimes, malware (bots, for instance) won't be programmed to look for files or content themselves, but can be commanded to do so via the C&C server, and your analysis may be predicated on a particular time window, or having access to the Internet so that the malware you're testing can connect to the C&C server.
2. You arrive on-site after a worm infection has been eradicated, and asked to determine what, if any data had been exfiltrated from the network.
In this case, the systems in question may still be live, but the malware may have been eradicated. Under such circumstances, you're not going to have processes to look at - running processes maintain a list of handle objects, and you may be able to get an idea of the file handles that a process has open. However, beyond that, this is scenario leaves you in much the same position as scenario #1.
3. You arrive on-site, and there are systems that are infected still running, that haven't been taken off-line or cleaned, and are still generating network traffic. At this point, what data do you need to determine data exfiltration?
The best source of information for determining data exfiltration at this point is going to be network traffic captures. At this point, you can see the data itself, and using tools like Wireshark, reconstruct entire network conversations and see what data was transmitted, and from which system. You can then use words and phrases that appear to be unique to do keyword searches on the systems themselves for the actual content that was accessed.
Another good source (albeit not the best) of information will be from the hosts themselves. Using either live response tools such as handle.exe, or dumping the contents of physical memory and using Volatility to parse out the list of open file handles from each process, you will be able to see which files the process in question had open. This will give you yet another data point in determining not only data exfiltration, but also perhaps any targeted nature of malware.
To sum up, in an ideal world, when faced with the question of data exfiltration via the network, the responder should be looking for a live system with an active process, and network packet captures. For other incidents where the exfiltration mechanism isn't know, non-digital sources such as CCTV, etc., might prove to be valuable.
A word about preparedness - F-Response. Yep, that's it. Data leaving your network, particularly when you don't want it to, is a very bad thing. Determining the cause, what data is being taken, where it's going, etc. - all the important questions that need to be answered - are all questions that require a quick and accurate response to answer. Even the fastest incident responders will take time to arrive on-site, whereas if you have F-Response Enterprise Edition already installed, the response time is reduced to however long it takes a responder to log in and access the central appliance. The advantage of tools like F-Response are a faster, greener response that doesn't replace having someone come on-site, but instead augments it. This all assumes, of course, that you're response plan isn't to run AV scans, delete stuff, take systems off-line, and then call someone...