Modular Deployments in AWS Cross-Account CI/CD Pipelines

When deploying code changes to production, how do you avoid “re-building” the entire solution and instead only build, test, and deploy the specific component(s) that changed?

This is important because we do not want to re-build, re-test, and re-deploy an entire infrastructure every time a commit triggers our CI/CD pipeline, especially when deploying changes frequently, in small batches.

Suppose you built a serverless ML training pipeline using Glue jobs, Lambda functions, and Step Functions to orchestrate their sequential execution. These components encapsulate:

  • Dataset Extraction
  • Data Validation
  • Data Pre-Processing & Feature Engineering
  • Training
  • Model Evaluation
  • Model Explanation
No alt text provided for this image

Let’s walk through the solution architecture diagram starting at the top left.

During a sprint, you complete a task that involves updating the Training Lambda function’s handler code (i.e. code that submits a SageMaker Hyperparameter Tuning job).

Within the SageMaker Studio UI, you commit your code changes into a short-lived CodeCommit feature branch, pull request to the dev branch, await code review, and merge into dev.

This merge-into-dev event triggers an EventBridge custom Rule, with a Lambda function as its invocation target. This Lambda function leverages boto3 to pull the latest commitId’s “diffs” (the files that have changed in the dev branch relative to the test branch).

These diffs will determine which specific component(s) will be built, tested, and deployed – in this case the Training Lambda function.

Since we are going to need this information downstream in our CI/CD pipeline, specifically during CodeBuild, we store the diffs in a “Deployment-Diffs” DynamoDB table. This table requires a composite primary key, where the partition key is the CommitId from CodeCommit, and the sort key is the ComponentName (i.e. Training-Lambda-Function).

This composite primary key guarantees each insert into the diffs table is unique, while maintaining diffs history per component. If we had avoided a sort key, we would have had to sacrifice either the diffs history or determine which individual component(s) changed (since each item write would overwrite the previous one; DynamoDB does not allow duplicate primary keys). We choose on-demand provisioning for both reads and writes to remain fully serverless.

This write to DynamoDB takes place within the Lambda function invoked by EventBridge, in response to the merge event from the feature branch into the dev branch. Based on the diffs, we use boto3 to start execution of CodePipeline.

CodePipeline is composed of 4 stages:

  1. Source (CodeCommit)
  2. Build & Test (CodeBuild)
  3. Cross-Account Deploy (CodeBuild)
  4. Merge test branch into master branch

The Source stage receives a copy of the CodeCommit repository’s test branch. The entire source code in the branch is passed as input throughout CodePipeline.

The Build & Test stage launches a CodeBuild Amazon Linux container, where the code to be executed is read from the copy of the repo’s test branch. We specify the full path to the buildspec.yml code file as part of CodeBuild project’s configuration.

This buildspec.yml file leverages the AWS CLI to perform a sequence of commands. Make sure to enable the “privileged” flag in the CodeBuild configuration to allow building Docker images. We only show the Training Lambda function for simplicity, but our buildspec.yml file includes all training pipeline components.

First, read diffs from DynamoDB and flip boolean flags from 0 to 1 conditionally based on the specific component(s) that changed

pre_build:
  commands:
    - aws ecr get-login-password --region $YOUR_REGION | docker login --username AWS --password-stdin $YOUR_AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com
    
    - CF_TEMPLATE_DIFF=0
    - TEST_BUILDSPEC_DIFF=0
    - PROD_BUILDSPEC_DIFF=0
    - TRAINING_LAMBDA_DIFF=0
    # Add any additional flags, one for each distinct solution component
    
    - COMMIT_ID=$(aws codecommit get-branch --repository-name REPO_NAME --branch-name dev --query 'branch.commitId')
    - COMMIT_ID="${COMMIT_ID%\"}"
    - COMMIT_ID="${COMMIT_ID#\"}"
    
    - DIFFS=$(aws dynamodb query --table-name Deployment-Diffs --projection-expression "Component" --key-condition-expression "CommitId = :value" --expression-attribute-values '{":value":{"S":'\"$COMMIT_ID\"'}}' --query 'Items[*].Component.S')
    
    - echo $DIFFS
    - |
      for COMPONENT in $DIFFS; do
    
        # Remove comma suffix
        COMPONENT="${COMPONENT%,}"
        # Remove quote suffix
        COMPONENT="${COMPONENT%\"}"
        # Remove quote prefix
        COMPONENT="${COMPONENT#\"}"
             
        if [ $COMPONENT == Training-Lambda-Function ]; then
          echo "Found a diff for Training-Lambda-Function"
          TRAINING_LAMBDA_DIFF=1
    
        elif [ $COMPONENT == template.yml ]; then
          echo "Found a diff for template.yml"
          CF_TEMPLATE_DIFF=1
    
        elif [ $COMPONENT == test-buildspec.yml ]; then
          echo "Found a diff for test-buildspec.yml"
          TEST_BUILDSPEC_DIFF=1
    
        elif [ $COMPONENT == prod-buildspec.yml ]; then
          echo "Found a diff for prod-buildspec.yml"
          PROD_BUILDSPEC_DIFF=1
    
        else
          continue
        fi
      done

Next, read the Training Lambda function’s current [production] version from a separate DynamoDB deployments metadata table. This table is also configured with a composite primary key, where the partition key is the Component name and the sort key is the Build number (note that Component name is the partition key in this table, while in the deployment diffs table it’s the sort key; each use case is different).

If the component changed, we auto-increment the Lambda function version by 1, build the code (from the input parameter repo) into a new Docker image, push to ECR, and store the new ECR image URI in a local variable (we will cover individual component testing in a separate article):

build:
  commands:
      - TIMESTAMP=$(date)
      - AUTO_INCREMENT=1
      - ECR_REPO=YOUR_ECR_REPO
      - AWS_ACCOUNT=YOUR_AWS_ACCOUNT
      - DEPLOYMENTS_BUCKET=YOUR_S3_BUCKET
      - DEPLOYMENTS_TABLE=YOUR_DYNAMODB_TABLE
      - STACK_NAME=YOUR_CF_STACK
     
     
      - echo $TRAINING_LAMBDA_DIFF
      - TRAINING_LAMBDA_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"Training-Lambda-Function"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
      - TRAINING_LAMBDA_VERSION=$(echo $TRAINING_LAMBDA_VERSION | cut -d'"' -f 2)
      - echo $TRAINING_LAMBDA_VERSION
      - |
        if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
          echo "Deploying new version of Training-Lambda-Function..."
          TRAINING_LAMBDA_NEW_VERSION=$(($TRAINING_LAMBDA_VERSION + $AUTO_INCREMENT))
          TRAINING_LAMBDA_ECR_NAME=training-lambda-function
          docker build -t $TRAINING_LAMBDA_ECR_NAME FULL_REPO_PATH/Training-Lambda-Function/.
          docker tag $TRAINING_LAMBDA_ECR_NAME $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
          docker push $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
          TRAINING_IMAGE_URI=$AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
        else
          echo "No new version of Training-Lambda-Function..."
          TRAINING_IMAGE_URI=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --select ALL_ATTRIBUTES --key-condition-expression "Component = :name AND BuildNumber = :build" --expression-attribute-values "{\":name\":{\"S\":\"Training-Lambda-Function\"},\":build\":{\"N\":\"$TRAINING_LAMBDA_VERSION\"}}" --max-items 1 --query 'Items[*].Location.S')
          TRAINING_IMAGE_URI=$(echo $TRAINING_IMAGE_URI | cut -d'"' -f 2)
          echo $TRAINING_IMAGE_URI
        fi

Suppose this Lambda function had not changed. Why do we still read the current production version number (outside of the conditional statement), and why do we have an else statement to pull the current production Lambda’s ECR image URI path?

Because in the post_build phase, CloudFormation deployment requires the code location of each component in –parameter-overrides, whether a given component changed or not (see code below). Therefore, we need the ability to pull each component’s current version and code location, while also allowing new, individual component, modularized deployments.

Finally, if CloudFormation deployment succeeds, we write the deployment metadata to DynamoDB:

post_build
  commands:
    - aws cloudformation package --template-file FULL_REPO_PATH/template.yml --output-template-file template-package.yml --s3-bucket $DEPLOYMENTS_BUCKET
     
    - aws cloudformation deploy --template-file template-package.yml --stack-name $STACK_NAME --parameter-overrides TrainingLambdaImageUri=$TRAINING_IMAGE_URI --capabilities CAPABILITY_NAMED_IAM --no-fail-on-empty-changeset
     
    - |
      if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
        echo "Writing Training-Lambda-Function metadata to DynamoDB..."
        aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"Training-Lambda-Function\"},\"BuildNumber\":{\"N\":\"$TRAINING_LAMBDA_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"$TRAINING_IMAGE_URI\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
      fi
     
    - echo $CF_TEMPLATE_DIFF
    - |
      if [ $CF_TEMPLATE_DIFF == 1 ]; then
        CF_TEMPLATE_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"template.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
        CF_TEMPLATE_VERSION=$(echo $CF_TEMPLATE_VERSION | cut -d'"' -f 2)
        echo $CF_TEMPLATE_VERSION
        CF_TEMPLATE_NEW_VERSION=$(($CF_TEMPLATE_VERSION + $AUTO_INCREMENT))
        echo "Writing template.yml metadata to DynamoDB..."
        aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"D2A.yml\"},\"BuildNumber\":{\"N\":\"$CF_TEMPLATE_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/template.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
      fi
     
    - echo $PROD_BUILDSPEC_DIFF
    - |
      if [ $PROD_BUILDSPEC_DIFF == 1 ]; then
        PROD_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"prod-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
        PROD_BUILDSPEC_VERSION=$(echo $PROD_BUILDSPEC_VERSION | cut -d'"' -f 2)
        echo $PROD_BUILDSPEC_VERSION
        PROD_BUILDSPEC_NEW_VERSION=$(($PROD_BUILDSPEC_VERSION + $AUTO_INCREMENT))
        echo "Writing prod-buildspec.yml metadata to DynamoDB..."
        aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"prod-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$PROD_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/prod-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
      fi
     
    - echo $TEST_BUILDSPEC_DIFF
    - |
      if [ $TEST_BUILDSPEC_DIFF == 1 ]; then
        TEST_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"test-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
        TEST_BUILDSPEC_VERSION=$(echo $TEST_BUILDSPEC_VERSION | cut -d'"' -f 2)
        echo $TEST_BUILDSPEC_VERSION
        TEST_BUILDSPEC_NEW_VERSION=$(($TEST_BUILDSPEC_VERSION + $AUTO_INCREMENT))
        echo "Writing test-buildspec.yml metadata to DynamoDB..."
        aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"test-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$TEST_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/test-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
      fi

If everything succeeds, we proceed to cross-account deployment into our production AWS account. See “AWS Cross-Account Deployments” for details. We will expand on this topic in future articles.

How do you deploy changes from test to prod within AWS, modularized such that only the components that changed are updated? Let us know in the comments.

If you need help implementing cloud-native MLOps, Well-Architected production ML software solutions, training/inference pipelines, monetizing your ML models in production, have specific solution architecture questions, or would just like us to review your solution architecture and provide feedback based on your goals, contact us or send me a message and we will be happy to help you.

Subscribe to my blog at: https://gradientgroup.ai/blog/

Follow me on LinkedIn: https://linkedin.com/in/carloslaraai

Leave a Reply

%d bloggers like this: