How Pagely implemented a serverless data lake in AWS to facilitate customer support analytics


Pagely is an AWS Advanced Technology Partner providing managed WordPress hosting services. Our customers continuously push us to improve visibility into usage, billing, and service performance. To better serve these customers, the service team requires an efficient way to access the logs created by the application servers. Historically, we relied on a shell script that gathered basic statistics on demand. When processing the logs for our largest customer, it took more than 8 hours to produce one report using an unoptimized process running on an Amazon EC2 instance—sometimes crashing due to resource limitations. Instead of putting more effort into fixing a legacy process, we decided it was time to implement a proper analytics platform.
All of our customer logs are stored in Amazon S3 as compressed JSON files. We use Amazon Athena to run SQL queries directly against these logs. This approach is great because there is no need for us to prepare the data. We simply define the table and query away. Although JSON is a supported format for Amazon Athena, it is not the most efficient format for use with regards to performance and cost. JSON files must be read in their entirety, even if
Source: https://managewp.org/articles/17780/how-pagely-implemented-a-serverless-data-lake-in-aws-to-facilitate-customer-support-analytics



source https://williechiu40.wordpress.com/2018/08/23/how-pagely-implemented-a-serverless-data-lake-in-aws-to-facilitate-customer-support-analytics/