Using S3 Notifications to sync caches

What is this for?

Let’s say you have servers in multiple regions. You may want to have pypicloud set up in each region. So in each region you set up an S3 bucket, a cache backend (let’s say DynamoDB), and a server running pypicloud. The first problem is that there is no communication; if you upload a package to one region, the other region doesn’t see it.

So you set up Cross-Region Replication for your S3 buckets, and now every time you upload a package in one region, it gets copied to the S3 bucket in the other. Unfortunately, pypicloud isn’t picking up on those changes so you still can’t install packages that were uploaded from a different region. That’s where this tool comes in.

Tell me more

Pypicloud provides some utilities for setting up an AWS Lambda function that will receive create and delete operations from a S3 bucket and sync those changes to your pypicloud cache. After it’s set up, you should be able to run a pypicloud stack in as many regions as you want and keep them all perfectly in sync.

Get Started

Create a config.ini file that contains the configuration for S3 storage and whichever cache you want to sync to. Make sure you have AWS credentials in a format that can be read by boto. Then run:

ppc-create-s3-sync config.ini

That’s all! You should now have an AWS Lambda function running whenever an object is created or deleted from your bucket.

Moving Parts

Chances are good that you will have to make some edits to this setup, so it’s important to know what it’s doing. There are three main components.

IAM Role

The Lambda function must have a Role that defines the permissions it has. ppc-create-s3-sync attempts to create one (named “pypicloud_lambda…”) that has permissions to write logs and read from S3. If your cache is DynamoDB, it also includes read/write permissions on the pypicloud tables.

Lambda Function

It builds a bundle and uploads it to a new Lambda function, then it gives the S3 bucket Invoke permissions on the function.

Bucket Notifications

The last step is to go to the S3 bucket and add a Notification Configuration that calls our lambda function on all ObjectCreate and ObjectDelete events.

More Details

I have only thoroughly tested this with a DynamoDB cache. You may have to make changes to make it work with other caches.

Many of the steps are customizable. Look at the args you can pass in by running ppc-create-s3-sync -h. For example, if you want to create the Role yourself you can pass the ARN in with -a <arn> and the command will use your existing Role.

If you’re building the Lambda function by hand, you can use ppc-build-lambda-bundle to build the zip bundle that is uploaded to Lambda. You will need to add an environment variable PYPICLOUD_SETTINGS that is a json string of all the relevant config options for the db, including pypi.db and all the db.<option>: <value> entries.

Feedback

This is all very new and largely untested. Please email me or file issues with feedback and/or bug reports. Did you get this working? Was it easy? Was it hard? Was it confusing? Did you have to change the policies? Did you have to change anything else?