Monitoring and Security on NeuroCAAS ==================================== - We implement monitoring of instance usage on NeuroCAAS via lambda functions. Here is the current layout: "Soft cap" protections: ----------------------- - test-ec2-killer - Kills all ec2 instances that are not exempt after 180 minutes of activity. - ec2-rogue-killer - Kills all ec2 instances that are not on ssm, or explicitly provided with a timeout after 15 minutes of activity. Exempt instances are given in an SSM parameter called exempt-instances “Hard cap functions” on total usage: ------------------------------------ - neurocaas-guardduty-develop - Stops all ec2 instances that have the developer security group after 2880 minutes of activity (2 days) - neurocaas-guardduty-deploy - Stops all ec2 instances that have the deploy security group after 120 minutes of activity. These functions provide a nice layer of security against unexpected usage in all cases except a ssm job that continues unnecessarily. Paired with user based budgets, this is a consistent system to monitor usage.