Monitoring and Forensics in a Cloud Computing Environment

Throughout the week, I have been streaming the third annual CloudSlam, cloud computing conference. Overall, I have watched many amazing (and many horrible) presentations on cloud hardware, software, and providers. My favorite presentation thus far, was on Microsoft's Azure, Microsoft's open cloud development environment. Azure, like many cloud services, lets user's focus on development, and not infrastructure. Here, Microsoft provides high computational throughput, large bandwidth utilizing global node locations, and unfathomable amounts of storage, all on tiered service plans. All of this is offered in a seamless environment, allowing you to develop in any .NET language, as well as Java, PHP, and/or Ruby. But what about monitoring? Users will want to know security details about application errors, as well as database server errors. And surly Microsoft will want consumer data trends to refine their services, but to what extent is this possible in the cloud? According to the 'AppFirst' presentation on "Cloud Monitoring", data polling in the cloud becomes much more complicated due to vast resource and subsystem sharing. This resource sharing is managed by your local hypervisor, which often times will skew timing results between resource requests and resource usage, which can present a huge problem for cloud forensics. These shared resources will also effect your cloud network bandwidth and in some cases even storage speeds. Because these subsystems are nontransparent, you can't see how much bandwidth another user on the cloud is consuming, which again will skew your own computational timing. This lack of visibility in subsystem processes can present errors in other locations as well. For example, often the kernel will report how many CPUs you have available, but this is not always 100% true, as the hypervisor will throttle this, reserving CPUs for other shared operating systems. This inaccurate reporting of the actual resources received will again lead to time skewing, obfuscating vital data used by both the monitoring and forensic processes. Disk storage is also another shared resource, and can provide significant obstacles when you have shared cloud users all using large write spaces at the same time. If you ever receive a 'disk busy' error in a cloud environment, this is VERY BAD as this means the shared subsystem is extremely saturated, and your timing efforts are likly far more skewed than you realize. A workaround for this is doing asynchronous local writes first, with proper log files intact, then updating your cloud service at another time. This means, at our current standpoint a hybrid environment, where local storage is utilized along with cloud resources might be the best option for successful monitoring. Cloud computing offers much stronger throughput, bandwith, and storage at a much cheaper price, but due to the nontransparent subsystems, depending on your needs, a hybrid environment can help you keep better track of your personal systems. Always remember, in security, awareness is key. So while cloud computing is very strong, with current cloud monitoring solutions, it is a blind strength; which means current trends will be shifting towards a hybrid cloud environment for the next few years, where companies can properly secure their own assets while still utilizing the cost saving benefits of the cloud.