Persistence Testing / Detection Testing / Purple Teaming
Here is an expansion on the philosophy of detection testing (or purple teaming) in information security. I've been talking about it for a minute, and I've learned from many of my peers who have done this, which is a more methodical approach to demonstratively improving a security program as opposed to traditional gap analysis or penetration testing. Others call this general philosophy purple teaming, but I like to call it detection testing, or persistence testing from a red team perspective.
The following are some outlines and methodologies for implementing this program yourself. I've used these methodologies in my practice and have seen both a quantitative improvement in blue team response / containment times, as well as noticeable qualitative improvements, such as finding detection gaps and improved threat hunting abilities. Management used these techniques to baseline activities and draw larger conclusions regarding their current response capabilities. Further, it pushed the blue teams to constantly evolve as a well oiled, threat containing force.
The scenarios are based around real threat actor campaigns observed in the wild. We collect their tools (ensure they are clean) and use them in similar kill chain models that fit the actor's motives. We attempt to capture their high level TTPS (tools, practice, and goals), while using our own infrastructure to launch the attacks (custom builds, packs, c2, and VPS). This lets us emulate their activity in a controlled manner. We can then train against our threat's tactics, and ensure our capabilities are at an acceptable level.
But first we need to talk about some philosophy on the classes of threats we will train against. My general premise is this, we can train against 3 classes of threats, the known known (everyday threats), known unknown (sophisticated commodity), and unknown unknown (advanced attacker). The 'known known' are malware with signatures, well known samples that we should be able to automate from deployment to either prevention, containment or eradication, with tools such as anti-virus. The 'known unknowns' are commodity malware that has been packed or designed to defeat modern automated intrusion prevention systems such as anti-virus. This should consistently bypass automated controls and cause the defenders to respond in some manner, however it should also trigger alerting at some point in their killchain. Finally, the 'unknown unknown' is custom malware or APT samples, that have been designed uniquely for this engagement (or semi-unique depending on skill and resources) and should force the blue teams to hunt or implement new detection systems, pushing their skills, practice, and capabilities. We can use real malware in our tests, however these are malware builders that we've triaged to make sure they aren't backdoored and are safe to use. Finally, it's extremely important to perform the due diligence to ensure one does not make the environment less secure.
That said, there is a pretty general premises for how to apply these payloads to your detection baseline. The kill chains (or threat actor life cycle) should be based off of blue team run books, or planned response actions, such that the blue team can exercise their prepared training in a unique and realistic event. This is a great way to test these procedures before a real event occurs. So essentially we are creating small tests using real indicators of attackers in your network. The key to this is implementation, iteration, and rapid feedback for rapid learning. You need a red team and blue team that are willing to work together to help each other learn. The environment should be slightly adversarial, such that you have a white team to implement the rules of the scenario in a fair and noncompetitive way. Think of it as a 'blind' table-top exercise, where the goal is to strategize and learn.
Red teams should carry out pre-planned kill chain scenarios, that involve multiple steps in the attack life cycle. The red team should know in advance the observable events they are creating, and meticulously keep timestamps of when these events occur, to provide to the white team in real time who can deconflict with the blue team as they hone in on the threat. Further, the phases of the kill chain should be padded such that they allow reasonable time for the blue team to respond and interrupt the attackers kill chain, containing the threat. As the blue team gets better with its response times, the red team can use less padding time. If the blue team fails to detect the red team, then the white team can use the observable events and timestamps to create a timeline in historic data, and show the missed opportunity. It can help if the red team keeps their observable events and timestamps in a scorecard, where they also estimate the times the attack could be detected, prevented, or contained.
Similarly the blue teams need to keep meticulous incident records, noting time stamps on major observable events, such that they can deconflict this exact attack activity with the white team (simulation organizer) far after the event. In fact, the blue team should respond to the event in the same manner that they treat every other incident, and hopefully this style program can drive far more meticulous note keeping (time stamps) on all observable events (which is often done when creating the attacker timeline anyway). In turn, when the blue team reports these findings up through the white team they will do it through the same scorecard of detected, prevented, and contained observable events that the red team filled out. In the case of the 'known known' malware, this will likely be blocked or eradicated by an existing security control, however the blue team should be able to corroborate this event in historic data, and similarly get insight to the quantity of attacks their automated controls are preventing.
These techniques add a metric by which management can baseline an incident response programs capabilities and SLAs. While it does create more work, it consistently pushes their practice, such that they can automate the known threats, respond to the dynamic threats, and constantly hunt for new threats. The effectiveness of this program is largely based on how often the teams can iterate the processes, various kill chain scenarios, and class of attacks. At an undisclosed location, they would run 6 scenarios a week, 2 'known known', 2 'known unknown', and 2 'unknown unknown'. This allowed them to push the capabilities of these scenarios every week, as the blue team was consistently learning and catching the threats week after week. Granted, they missed the attackers sometimes, but sometimes terabytes of fake data would be exfiltrated, and it was important to postmortem with the collected observable events after those incidents, such that they understand how to contain the threat the next time (or when it's real). In this manner the collaborative teams can keep pushing one anther's skills in a measurable way.
It's also important to remember that throughout all of this, there are no winners or losers, just two parts of the team increasing the overall security posture and capabilities of the organization. I actually highly recommend taking turns within the blue team rotating through the red team position, as each person will bring novel ideas to the drawing board, such as how to bypass the defenses and detection methods they spent all that time setting up.
The following are some outlines and methodologies for implementing this program yourself. I've used these methodologies in my practice and have seen both a quantitative improvement in blue team response / containment times, as well as noticeable qualitative improvements, such as finding detection gaps and improved threat hunting abilities. Management used these techniques to baseline activities and draw larger conclusions regarding their current response capabilities. Further, it pushed the blue teams to constantly evolve as a well oiled, threat containing force.
The scenarios are based around real threat actor campaigns observed in the wild. We collect their tools (ensure they are clean) and use them in similar kill chain models that fit the actor's motives. We attempt to capture their high level TTPS (tools, practice, and goals), while using our own infrastructure to launch the attacks (custom builds, packs, c2, and VPS). This lets us emulate their activity in a controlled manner. We can then train against our threat's tactics, and ensure our capabilities are at an acceptable level.
But first we need to talk about some philosophy on the classes of threats we will train against. My general premise is this, we can train against 3 classes of threats, the known known (everyday threats), known unknown (sophisticated commodity), and unknown unknown (advanced attacker). The 'known known' are malware with signatures, well known samples that we should be able to automate from deployment to either prevention, containment or eradication, with tools such as anti-virus. The 'known unknowns' are commodity malware that has been packed or designed to defeat modern automated intrusion prevention systems such as anti-virus. This should consistently bypass automated controls and cause the defenders to respond in some manner, however it should also trigger alerting at some point in their killchain. Finally, the 'unknown unknown' is custom malware or APT samples, that have been designed uniquely for this engagement (or semi-unique depending on skill and resources) and should force the blue teams to hunt or implement new detection systems, pushing their skills, practice, and capabilities. We can use real malware in our tests, however these are malware builders that we've triaged to make sure they aren't backdoored and are safe to use. Finally, it's extremely important to perform the due diligence to ensure one does not make the environment less secure.
That said, there is a pretty general premises for how to apply these payloads to your detection baseline. The kill chains (or threat actor life cycle) should be based off of blue team run books, or planned response actions, such that the blue team can exercise their prepared training in a unique and realistic event. This is a great way to test these procedures before a real event occurs. So essentially we are creating small tests using real indicators of attackers in your network. The key to this is implementation, iteration, and rapid feedback for rapid learning. You need a red team and blue team that are willing to work together to help each other learn. The environment should be slightly adversarial, such that you have a white team to implement the rules of the scenario in a fair and noncompetitive way. Think of it as a 'blind' table-top exercise, where the goal is to strategize and learn.
Red teams should carry out pre-planned kill chain scenarios, that involve multiple steps in the attack life cycle. The red team should know in advance the observable events they are creating, and meticulously keep timestamps of when these events occur, to provide to the white team in real time who can deconflict with the blue team as they hone in on the threat. Further, the phases of the kill chain should be padded such that they allow reasonable time for the blue team to respond and interrupt the attackers kill chain, containing the threat. As the blue team gets better with its response times, the red team can use less padding time. If the blue team fails to detect the red team, then the white team can use the observable events and timestamps to create a timeline in historic data, and show the missed opportunity. It can help if the red team keeps their observable events and timestamps in a scorecard, where they also estimate the times the attack could be detected, prevented, or contained.
Similarly the blue teams need to keep meticulous incident records, noting time stamps on major observable events, such that they can deconflict this exact attack activity with the white team (simulation organizer) far after the event. In fact, the blue team should respond to the event in the same manner that they treat every other incident, and hopefully this style program can drive far more meticulous note keeping (time stamps) on all observable events (which is often done when creating the attacker timeline anyway). In turn, when the blue team reports these findings up through the white team they will do it through the same scorecard of detected, prevented, and contained observable events that the red team filled out. In the case of the 'known known' malware, this will likely be blocked or eradicated by an existing security control, however the blue team should be able to corroborate this event in historic data, and similarly get insight to the quantity of attacks their automated controls are preventing.
These techniques add a metric by which management can baseline an incident response programs capabilities and SLAs. While it does create more work, it consistently pushes their practice, such that they can automate the known threats, respond to the dynamic threats, and constantly hunt for new threats. The effectiveness of this program is largely based on how often the teams can iterate the processes, various kill chain scenarios, and class of attacks. At an undisclosed location, they would run 6 scenarios a week, 2 'known known', 2 'known unknown', and 2 'unknown unknown'. This allowed them to push the capabilities of these scenarios every week, as the blue team was consistently learning and catching the threats week after week. Granted, they missed the attackers sometimes, but sometimes terabytes of fake data would be exfiltrated, and it was important to postmortem with the collected observable events after those incidents, such that they understand how to contain the threat the next time (or when it's real). In this manner the collaborative teams can keep pushing one anther's skills in a measurable way.