Initializing a blueprint

We first need to initialize a blueprint for your analysis that will record you development process. This portion of the process should take about 10-15 minutes. All NeuroCAAS blueprints are stored in a single directory in the source repo, in neurocaas/ncap_iac/ncap_blueprints. This is where you will create your new analysis as well.

We will use the CLI tool to create a new analysis here.

Navigate to the ncap_blueprints directory, and run the command:

neurocaas-contrib init --location /path/to/neurocaas/ncap_iac/ncap_blueprints

This will let your CLI tool know that this location is the place it should store analysis blueprints that you develop. Upon running this comand, you will be prompted for the name of the analysis you want to develop, and asked to create one if it does not exist already:

../_images/init_write.png

Note that the analysis name must be restricted to lowercase letters, numbers, and hyphens (-). If you find yourself working on multiple analyses at once, you can switch between them by running this command as well. You can always remind yourself what analysis you are configured to work with by calling the command neurocaas-contrib describe-analyses:

../_images/describe-analyses.png

Doing so will list the analyses you have available in your chosen blueprint storage location, as well as the one that is currently initialized (indicated by an asterisk).

Once you have initialized a blueprint, you can examine it in any text editor at the path:

/path/to/neurocaas/ncap_iac/ncap_blueprints/<analysis_name>/stack_config_template.json

This blueprint contains all of the details that specify the different resources and services that will support your analysis. When initializing an analysis, we can leave most of these fixed, but there are a few that we should go over: The parameters that you will probably change are:

  • PipelineName: This should be the name of your analysis, or something similar.

  • REGION: The global region (An AWS parameter) where you want to develop. By default “us-east-1” is a good choice.

  • STAGE: This parameter describes different stages of pipeline development. It should be set to “webdev” while initializing a blueprint.

  • Lambda.LambdaConfig.INSTANCE\_TYPE: INSTANCE\_TYPE specifies the hardware configuration that is run by default (can be changed on demand) and is selected from a list of instance types available on AWS. A good default choice is m5.xlarge, and a good choice with access to a GPU is p2.xlarge.

Important parameters to keep in mind for later: - Lambda.LambdaConfig.AMI: AMI specifies the Amazon Machine Image where your software and dependencies are installed, and contains most of the analysis-specific configuration details that you must specify. As you develop, you will save your progress into different AMIs so they can be linked to the blueprint through this parameter.

  • Lambda.LambdaConfig.COMMAND: COMMAND specifies a shell command that will be run on your remote instance [with parameters specified in the main script section] to generate data analysis. This command can technically take the form of any shell command, but will most likely take the form cd /home/ubuntu; neurocaas_contrib/run_main_cli.sh {} {} {} {} path_to_analysis_script; . neurocaas_contrib/ncap_utils/workflow.sh; cleanup. For most use cases, the only thing that you will have to change about the command given here is the path_to_analysis_script. More specifics of this command will be specified later.

  • Affiliates: This field describes users of the analysis. It is largely used for testing, as when it is deployed later, all users will be able to access this analysis by default. For now, remove all Affiliates from the UXData area except for “debuggers”; we will return to this later in this section.

Linking your blueprint to cloud resources

We have now reached the point where you will have to start interacting with cloud resources, which means that you will need AWS account credentials. This portion of the process should take about 10 minutes, and response time from the NeuroCAAS Team (within 24h), or setup time of an AWS account (about 1-2h).

Cloning NeuroCAAS to an alternative AWS Account

Skip to Developing On the Command Line if you are developing within the main NeuroCAAS account.

Alternatively, you can also set up development and use of a NeuroCAAS platform clone on a separately managed AWS account. NOTE: AWS charges in real time for resource use. Although we provide the exact same cloud resource management tools to clone NeuroCAAS platforms through IaC, make sure that you know what you are doing with regard to cost monitoring and EC2 instance management before going with this route. To proceed with your own AWS account, you will need programmatic access (an AWS key pair) with admin permissions on this account. Using this key pair, follow the steps in Configuring Credentials below, and Cloning the platform afterwards.

Configuring Credentials with an alternative AWS Account

Important: this section deals with security credentials. Please proceed carefully.

From the previous step, you should now have an AWS Key Pair (Key and Secret Key), either from a pull request, or from an independent account. These credentials will allow your to launch cloud compute instances, and as such are extremely valuable. IMPORTANT: You MUST keep this file in separate, secure directory not under version control. If this key is exposed, it poses financial risks and privacy risks to the entire account.

First verify your aws cli installation by running:

% aws configure

When prompted, enter the access key and secret access key associated with your admin account, as well as the AWS region you are closest to. Set the default output type to be json.

With these credentials in place, you are now ready to clone NeuroCAAS to your account.

Cloning the platform

To initialize NeuroCAAS on a separate AWS account, first follow the Installation instructions for the binxio secret provider stack. Doing so will let you manage sensitive information like SSH keys through IaC. <https://github.com/binxio/cfn-secret-provider>.

Warning

When following instructions here, make sure that you confirm you are in the region you want to do most of your work in. In particular, if using the console, the default region is eu-central-1: make sure to change this in the dropdown menu to the region closest to you.

Navigate within the neurocaas repository to: neurocaas/ncap_iac/ncap_blueprints/utils_stack.

Now run the following command:

% bash initialize_neurocaas.sh

This will create the cloud resources necessary to deploy your resources regularly and handle the permissions necessary to manage and deploy cloud jobs, and ssh keys to access resources on the cloud. The results of initialization can be seen in the file global_params_initialized.json, with the names of resources listed. If you encounter an error, consult the contents of the file neurocaas/ncap_iac/ncap_blueprints/utils_stack/init_log.txt, and post it to the neurocaas issues page.

Once you have done this step, you have a working instance of the NeuroCAAS platform set up in a different infrastructure stack.

Setting up developer tools for your cloned platform

Now that you have a clone of NeuroCAAS, you can configure it to use the same testing resources that we use locally.

First, create a test user. Run the following command from the AWS CLI:

aws iam create-user --user-name test-user{your_region_name}

Where you substitute in your region name for the bracketed area. You will refer to this test user when creating blueprints.

Next, we’ll set up the forked NeuroCAAS repository with your credentials so that you can deploy blueprints through Github. We will add four secrets to your fork of the NeuroCAAS repo as described here. Take your AWS key pair, and passthe key and secret key respectively under the fields DEPLOY_AWS_ACCESS_KEY_ID and DEPLOY_AWS_SECRET_ACCESS_KEY.

Furthermore, you’ll want to create a docker token as described here:, and pass additionally your access token as DOCKER_HUB_ACCESS_TOKEN and your docker username as DOCKER_HUB_USERNAME. These credentials are required because we use Docker to deploy analyses to NeuroCAAS via IaC.

Finally, in order to retrieve your ssh key, navigate within the source repository to:

/path/to/neurocaas/ncap_iac/ncap_blueprints/utils_stack. Type the following command:

% bash print_privatekey.sh > securefilelocation/securefilename.pem

Where securefilelocation/securefilename.pem is a file NOT under version control. IMPORTANT: We strongly recommend you keep this file in separate, secure directory not under version control. If this key is exposed, it will expose the development instances of everyone on your account. You will reference this key when developing a machine image later. Finally, change the permissions on this file with:

% chmod 400 securefilelocation/securefilename.pem

Cloning an existing analysis from the main repo to a cloned repo

You can clone an analysis from the main NeuroCAAS analysis by modifying the blueprint for the analysis as follows:

  1. Change the given PipelineName parameter: these are unique across all of AWS.

  2. Delete the security group given, and let it be autofilled by your NeuroCAAS account clone.

  3. Request that the AMI underlying the analysis be shared from the main account via a github issue.

  4. Open a pull request with the changed blueprint into your own repository, and comment on the pull request with the command: #deploy:{foldername}, where foldername is replaced by the name of the folder where your blueprint is stored in ncap_blueprints. Doing so will trigger an automatic deployment of this analysis on your account.