Skip to content

Provide a python library implementing Git features to deal with an AWS CodeCommit repository content

Notifications You must be signed in to change notification settings

papiveron/aws-hack-git-codecommit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Git-like library for AWS CodeCommit

codecommit aims to provide a python library AWSCodeCommit class to you to interact with your AWS CodeCommit Git Repository programmatically without the need of Git and Git Credentials

The context

This library was born in a context of building an Infrastructure Automation & CI/CD solution in a full AWS environment, using exclusively AWS Developer and Management Tools/Services

The solution was then architected around AWS CodePipeline, using AWS CodeCommit for sourcing and AWS ServiceCatalog for products provisioning from AWS CloudFormation, with the most flexibility and cost effectiveness.

The detail of this solution architecture is out of the scope of this README, you can find a brief overview of the concept in this blog post

But for short, the solution consists in managing business units team developments, leveraging their software product shifting from Development to Production environment through projects. Thus for each team/project there are :

  • One pipeline called the ops pipeline, taking all changes from the master branch of the project repository (in AWS CodeCommit), releasing those changes in a staging environment (Continuous Delivery) for validation tests, promoting and deploying the release in production environment (Continuous Deployment)

  • One pipeline called the dev pipeline, processing changes from any feature branch of the project repository and provision products in a development AWS ServiceCatalog portfolio. This allows to test all features branches autonomously and independently before merge them for release (Continuous Integration). When feature branches are integrated and merged to the master branch the Continuous Delivery automatically takes place through the ops pipeline.

Once again the whole solution architecture documentation is out of the scope of this README and will be available later separately.

The problematic

One of the limits I had to face with was that AWS CodePipeline doesn't support processing changes from multiple branches within a given CodeCommit repository, and one feature of the solution was to provide a full automated and agnostic development environment to the Development teams. Unfortunately, by the time of this documentation AWS CodePipeline only supports being connected to a single CodeCommit repository branch.

To achieve the need I designed the solution to manage AWS CodeCommit feature branches creation and deletion dynamically and have feature branches changes invoking an intermediate Lambda function that uploads the code to S3, and then invokes the dev pipeline. Here's an outline of this approach in two phases : branches lifecycle and branches updates

  • Branches lifecycle
    • Developers create feature branches for their developments upstream and dowstream
    • Each branch creation trigger a "lifecyle" Lambda function wich creates a repository trigger for all futur changes on this specific feature branch
  • When the feature branch is deleted (for example after a merge), the above repository trigger is deleted
  • Branches updates
    • Changes are pushed to repository feature branch
    • According to the trigger configuration of that feature branch, the "updates" Lambda function is invoked.
    • The "updates" Lambda function uses the data from the event to checkout a copy of the code and upload it to an S3 bucket on a fixed location, using an implementation of git-archive
    • The "updates" Lambda function invokes the StartPipelineExecution API call to start the dev pipeline, which is pre-configured to source from the S3 (on the fixed location mentioned above) instead of AWS CodeCommit.

We won't get here into technical details of "lifecycle" and "updates" Lambda functions code and their inputs data, as these functions are part of the whole Infrastructure Automation & CI/CD so lution architecture, which is not the main subject of this README. They are mentionned here just to give a complete of the context in which the codecommit library mainly concerned by this document ation was writen

Nevertheless an another blocking point comes here for checking out and archiving the code from the feature branch :

  • As mentioned in the context section, we are managing several developper teams projects dynamically, in different AWS accounts
  • For each project, dev and ops pipelines as well as others AWS required resources such as IAM roles and policies are set up automatically
  • Git needs credentials to authenticate to AWS CodeCommit repository
  • It's not possible to use IAM role or either credential helper to interact with AWS CodeCommit programmatically. The credential helper is only available to be used with the AWS CLI.
  • To obtain AWS CodeCommit credentials for programmatic authentication using Git, you should connect to AWS console, select an IAM user, generate git HTTPS credentials and store them in a place where the Lambda function can retrieve them
  • As all resources setup is automatic, the above credentials generation is not envisageable, for the simple reason that we cannot connect to AWS console and manually git generate credentials in the middle of each project pipelines setup.
  • This is where the CodeCommit library came out to be used by above "lifecycle" and "update" Lambda function for Git related purposes.

The solution to Git problem

To address the programmatic difficulty/limit described just above, I decided to code a library that implements the git-archive feature to be used by my AWS CodeCommit repository triggers funcitons. The code is in the codecommit module of this repository.

The library exposes an archive method among others, but can be extended to offers more Git related useful stuff. Using the AWSCodeCommit class provided here, there is no more need to use a Git library such GitPython, and no headache with AWS git HTTPS credentials

After you have called the archive method, your repository content is copied in an in-memory zip file, and you can write it to your local disk by calling the flush_content method, or alternatively access it using the content property and write it to any place you need

Aside the codecommit library module I provided my "branches lifecycle" and "branches updates" Lambda functions code, which can be seen as example of the library usage, or a solution to AWS CodePipeline multi branch support for AWS CodeCommit exposed in this documentation. You can absolutely take only the codecommit library module and use it in your code for your specific need.

....
client = boto3.client('codecommit')
codecommit = AWSCodeCommit(aws_client, my_repo, logger)
codecommit.archive('staging')
print(codecommit.flush_content())
....

About

Provide a python library implementing Git features to deal with an AWS CodeCommit repository content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages