Investigating Breaches
I recently received an email from someone who said that he wanted to learn more about "network intrusion investigations". Seeing this, I asked him to clarify, and he said that he was interested in learning what to look for when someone breaks into a system from the network.
This got me to thinking...how would one go about teaching this subject? What's a better way to see if you really understand something than to sit down and try to communicate a how-to to another person? So I started thinking about this, and it brought up another conversation I had with some other folks...actually, a series of conversations I had over about 18 months. Specifically, I had conversations about intrusion investigations with some guys who went about discovering vulnerabilities and writing exploits. My thinking was that in order for these guys to be able to report a successful exploit using a vulnerability they found, they would have to have a test system or application, and a condition to define success. I won't got into the details of the exchange...what matters here is that at one point, one of them said, "you aren't looking for artifacts of the initial intrusion...you're looking for artifacts of what the bad guy does after the exploit succeeds."
Well, I have to say, I disagreed with that at that time, but for the purposes of investigating data breaches, one of the primary things you need to determine is, was the system in question, in fact, breached? One way to answer that question is to look for indications of "suspicious" activity on the system. Was malware or some means of access or persistence installed? AV scans are a good place to start, but I'd suggest that analyzing the acquired image for indications of AV tools already installed should be the first step. Why is that? How many analysts mount an acquired image and scan it with an updated commercial AV scanning tool? Okay, you can put your hands down. Now, how many check the system or the image first to see that the tool they use wasn't the one installed on the system? See my point? If the AV scanner missed something once, what's to say it won't miss it again?
Anyway, I've already talked about malware detection, so I won't belabor the point here.
Peter Silberman of Mandiant once mentioned that malware is often "the least frequency of occurrence" on a system, and in many instance, the same may apply to an intrusion. As such, the analyst will not be looking for sudden increases in activity on a system but instead looking for very specific data points within a discrete time window. I've created several timelines in which the predominance of file system activity was the result of system or application updates, as well as AV scans run by the administrators. Often, the necessary data points or time window may be established via other means of analysis.
Overall, I don't believe that you can teach one specific means for investigating breaches, and consider that sufficient. Rather, what must be done instead is to develop an overall understanding of data sources, and what a breach "looks like", and then conduct specific analysis from there.
For example, by what means could a system be breached? Well, let's narrow the field a bit and say that we're looking at a Windows XP laptop...what possibilities are there? There are those services that may be running and listening for connections (check the firewall configuration as it may not be the default), or there may be indications of a breach as a result of user activity, such as downloads via the browser (intentional or otherwise), email, P2P applications, etc. What we may end up doing is examining the system for secondary indications of a breach (i.e., creation of persistence mechanisms, such as user accounts, etc.), and working from there to, at the very least, establish a timeline or at least an initial reference point.
Another point to remember about investigations in general is that the more data points that you have to support your findings, the better. This not only helps you build a more solid picture (and eliminate speculation) of what happened on the system (and when) but it also allows you to build that picture when some data points do not exist.
Let's look at an example...let's say that you suspect that the system you're examining may have been accessed via Terminal Services, using compromised credentials. You examine the Windows Services found in the Registry and determine that Terminal Services was set to start when the system started, and other data points to the fact that remote connections were allowed. So your next step might be to look at Security Event Log entries that would show signs of logins...but you find through Registry and Event Log analysis that auditing of login events wasn't enabled. What then? Well, you might think to look for the time that specific files had last been accessed...but if the system you're examining is a Vista system, well, by default, the updating of last access times is disabled. So what do you do?
Again...the more data points you have to support a finding, the better off you'll be in your analysis. Where one data point is good, six may be better. A knowledgeable analyst will know that while some modicum of work will be needed to establish those other five data points, much of it can be automated and made repeatable, increasing efficiency and reducing analysis time.
Perhaps a way to illustrate the overall theme of this post is to look at examples, and that's what we'll be doing in the future. In the meantime, questions and comments are always welcome.
This got me to thinking...how would one go about teaching this subject? What's a better way to see if you really understand something than to sit down and try to communicate a how-to to another person? So I started thinking about this, and it brought up another conversation I had with some other folks...actually, a series of conversations I had over about 18 months. Specifically, I had conversations about intrusion investigations with some guys who went about discovering vulnerabilities and writing exploits. My thinking was that in order for these guys to be able to report a successful exploit using a vulnerability they found, they would have to have a test system or application, and a condition to define success. I won't got into the details of the exchange...what matters here is that at one point, one of them said, "you aren't looking for artifacts of the initial intrusion...you're looking for artifacts of what the bad guy does after the exploit succeeds."
Well, I have to say, I disagreed with that at that time, but for the purposes of investigating data breaches, one of the primary things you need to determine is, was the system in question, in fact, breached? One way to answer that question is to look for indications of "suspicious" activity on the system. Was malware or some means of access or persistence installed? AV scans are a good place to start, but I'd suggest that analyzing the acquired image for indications of AV tools already installed should be the first step. Why is that? How many analysts mount an acquired image and scan it with an updated commercial AV scanning tool? Okay, you can put your hands down. Now, how many check the system or the image first to see that the tool they use wasn't the one installed on the system? See my point? If the AV scanner missed something once, what's to say it won't miss it again?
Anyway, I've already talked about malware detection, so I won't belabor the point here.
Peter Silberman of Mandiant once mentioned that malware is often "the least frequency of occurrence" on a system, and in many instance, the same may apply to an intrusion. As such, the analyst will not be looking for sudden increases in activity on a system but instead looking for very specific data points within a discrete time window. I've created several timelines in which the predominance of file system activity was the result of system or application updates, as well as AV scans run by the administrators. Often, the necessary data points or time window may be established via other means of analysis.
Overall, I don't believe that you can teach one specific means for investigating breaches, and consider that sufficient. Rather, what must be done instead is to develop an overall understanding of data sources, and what a breach "looks like", and then conduct specific analysis from there.
For example, by what means could a system be breached? Well, let's narrow the field a bit and say that we're looking at a Windows XP laptop...what possibilities are there? There are those services that may be running and listening for connections (check the firewall configuration as it may not be the default), or there may be indications of a breach as a result of user activity, such as downloads via the browser (intentional or otherwise), email, P2P applications, etc. What we may end up doing is examining the system for secondary indications of a breach (i.e., creation of persistence mechanisms, such as user accounts, etc.), and working from there to, at the very least, establish a timeline or at least an initial reference point.
Another point to remember about investigations in general is that the more data points that you have to support your findings, the better. This not only helps you build a more solid picture (and eliminate speculation) of what happened on the system (and when) but it also allows you to build that picture when some data points do not exist.
Let's look at an example...let's say that you suspect that the system you're examining may have been accessed via Terminal Services, using compromised credentials. You examine the Windows Services found in the Registry and determine that Terminal Services was set to start when the system started, and other data points to the fact that remote connections were allowed. So your next step might be to look at Security Event Log entries that would show signs of logins...but you find through Registry and Event Log analysis that auditing of login events wasn't enabled. What then? Well, you might think to look for the time that specific files had last been accessed...but if the system you're examining is a Vista system, well, by default, the updating of last access times is disabled. So what do you do?
Again...the more data points you have to support a finding, the better off you'll be in your analysis. Where one data point is good, six may be better. A knowledgeable analyst will know that while some modicum of work will be needed to establish those other five data points, much of it can be automated and made repeatable, increasing efficiency and reducing analysis time.
Perhaps a way to illustrate the overall theme of this post is to look at examples, and that's what we'll be doing in the future. In the meantime, questions and comments are always welcome.