Monitoring and Security on NeuroCAAS

  • We implement monitoring of instance usage on NeuroCAAS via lambda functions.

Here is the current layout:

“Soft cap” protections:

  • test-ec2-killer
    • Kills all ec2 instances that are not exempt after 180 minutes of activity.

  • ec2-rogue-killer
    • Kills all ec2 instances that are not on ssm, or explicitly provided with a timeout after 15 minutes of activity.

Exempt instances are given in an SSM parameter called exempt-instances

“Hard cap functions” on total usage:

  • neurocaas-guardduty-develop
    • Stops all ec2 instances that have the developer security group after 2880 minutes of activity (2 days)

  • neurocaas-guardduty-deploy
    • Stops all ec2 instances that have the deploy security group after 120 minutes of activity.

These functions provide a nice layer of security against unexpected usage in all cases except a ssm job that continues unnecessarily. Paired with user based budgets, this is a consistent system to monitor usage.