Deep Learning with 21 and AWS

Posted by Jeremy Kun

Using 21 and AWS to host a deep learning endpoint

In this tutorial we'll set up a bitcoin-payable API for a deep learning algorithm using Amazon Web Services (AWS) for the computational back end. The algorithm we'll serve is an example of an artistic style transfer algorithm that applies the artistic style of one image to another image.

style transfer example

Although the 21 tools allow anyone to set up a bitcoin-payable API from any computer, running our algorithm on AWS means it won't slow down your work machine by serving user requests. Even if you own a powerful supercomputer, you probably don't want to let public demand monopolize your machine's resources.

The final Django app of this tutorial is available in a github repository.

AWS Pricing

Amazon Web Services provides a large number of services for on-demand cloud computing and storage. In this tutorial we'll primarily be using S3 for storage and EC2 for our compute-heavy back end. Both of these services cost money, but for most hobby purposes S3 is free. The pricing details show that you get 5GB of storage and thousands of requests for free for the first year of usage. If you do end up using a lot of space, 1TB works out to around $12/month.

EC2 offers a large number of machine types at varying prices. They have a free tier, where a "t2.micro" machine with 1GB of memory and a single CPU can be used for 750 hours per month for free for the first year. In this tutorial we'll be using a "g2.2xlarge" machine that has 15 GB of memory, 8 CPUs and access to a NVIDIA GRID GPU with 1,536 CUDA cores and 4 GB of video memory. This machine costs $0.65 USD per hour, with the minimum billing interval being one hour.

The configuration above is well suited for a GPU computation that takes between 15 minutes and 1 hour. Similarly powerful machines that don't have GPU access are just under $0.50 USD per hour. If you're a spendthrift, for $4.00 USD per hour you can get an X1 instance with 128 CPUs with almost 2TB of memory.

Outline

We'll use Django for this tutorial. If you aren't familiar with Django, see the 21 Django and Heroku tutorial.

The basic flow will be a Django app which handles requests from the user and launches an EC2 instance for each buy request. We'll store any input data that the user provides in an S3 bucket, and configure the EC2 instance to read to and write from that bucket. The server will check to see if the EC2 instance is done by checking that its outputs are stored in the S3 bucket.

Because a single invocation of the algorithm can take time, in this tutorial we'll send the client a token when the EC2 instance is successfully launched, and the client can use that token to redeem the output at a later time. A more complicated API might ask the user for an email address and send them an email when the computation is finished. Our basic client-server interaction is described by the following diagram

flow diagram

So a client makes the initial request. The server generates a token, pushes the inputs to S3, and spins up an EC2 instance which talks to S3. The server then gives the token to the client. The client can then poll the server to see if the computation is done, and when it is, the server returns the outputs. This isn't an ideal customer experience, but it's a simple template one can iterate and improve on.

In the first half of this tutorial we'll explain how to programmatically manage EC2 instances and S3 buckets from Python using the boto3library. The second half of this tutorial will incorporate this into a Django app with the behavior of the above diagram.

Part 1: Managing AWS from Python

The crux of this endpoint is launching and monitoring an EC2 instance. We'll use the python boto3 library for this. The crucial section of the boto3 documentation is the EC2 create_instances function. The basic usage looks like this:

import boto3

def spin_up():
    ec2 = boto3.resource('ec2')
    instances = ec2.create_instances(
        ImageId=AMI_ID,
        InstanceType=INSTANCE_TYPE,
        ...
    )

    instance = instances[0]
    print('Spinning up instance with id {} at {}'.format(instance.id, instance.launch_time))

    instance.wait_until_running()
    instance.reload()
    print('Instance {} has finished spinning up. Public DNS is {}'.format(
        instance.id, instance.public_dns_name)
    )

    return instance.id

Most of the difficulty in using boto3 is in providing the correct arguments to the create_instances function to ensure the instance has the correct access permissions and termination behavior.

The first argument, ImageId, specifies what AWS calls an "Amazon Machine Image" (AMI) for your EC2 instance. An AMI is a snapshot of a machine, and it includes things like

  • The operating system running on that machine
  • The users created on that machine
  • The state of the file system

This is convenient because if the algorithm you want to sell has a complex set of dependencies, you can configure those dependencies once on AWS, create a snapshot AMI, and use that AMI for new instances and perfect reproducibility. We'll walk through how to create a custom AMI later in this tutorial.

Signing up for AWS

The first thing we need to do is create an AWS account and get credentials. Sign up for AWS, and then create an access key at the IAM home. This should consist of a 20-character access key and a 40-character secret key. We'll be recording these as environment variables, but remember that they should be kept secret. You should also pick a default AWS region, and note that all of your AWS configurations are specific to a region. We'll use us-east-1.

Boto3 fetches the two secret keys from the environment. In the second part we'll use a proper method for storing these secrets. But in the first part we can just include them in our python program.

import os

# note, in part 2 we will move these to a separate .env file so they
# aren't accidentally published in a public repository.
os.environ["AWS_ACCESS_KEY_ID"] = "<YOUR_SECRET>"
os.environ["AWS_SECRET_ACCESS_KEY"] = "<YOUR_SECRET>"
os.environ["AWS_DEFAULT_REGION"] = "<YOUR_REGION>"

S3 bucket

We'll start by giving our instance access to S3. Create a new bucket at the S3 console and record its name.

IAM instance profile

AWS uses what they call "Instance Access Management (IAM) Instance Profiles" to define permissions for an EC2 instance to interact with other AWS services. Browse to the IAM console and click on "Roles." Here you can create an instance profile, to which we'll attach the 'AmazonS3FullAccess' policy. You can create a more restrictive access profile if you want.

First give the role a name.

role name screenshot

Then select Amazon EC2.

Select role type screenshot

Then select 'AmazonS3FullAccess' and click "Next Step."

Attach policy screenshot

Now click "Create Role."

Review screenshot

Finally, select the newly created role from the list and copy down the "Instance Profile ARN(s)" field value. It should look roughly like arn:aws:iam::<integer id>:instance-profile/<role name>.

arn screenshot

Security groups and SSH keys

You will likely want to SSH into a running instance to debug problems or perform configuration during the initial setup. So this step will create an SSH key-pair and configure our EC2 instances to allow SSH access from a specific IP address.

On the EC2 dashboard under "Network and Security", click on "Security Groups," and create a new security group. The most basic way to fill out the fields is to give SSH access to your IP only, but you might also reasonably open the HTTP port to all IPs, and have a nice landing page with a description of how to use your API.

security group screenshot

The name of the security group will be passed to create_instances.

In the same "Network and Security" section, click "Key Pairs" and then "Create Key Pair." Give it a name and upon clicking "Create" your browser will automatically download a .pem file. Save this .pem file in an appropriate place like ~/.ssh. If you lose this file you'll have to generate another one from the AWS console.

Important: You must change the permissions on your .pem file to 400, or else it will be rejected by AWS when you try to SSH into an instance, and you'll have to generate a new key.

$ chmod 400 /path/to/.pem

Record the name of the key (the part before .pem), as we will pass it to create_instances.

Cloud-config scripts

We'll give commands to our EC2 instance via a "cloud-config" script. This is a script that a newly created EC2 instance will run after booting up, and will allow us to install packages and run commands. This script is passed to create_instances via the UserData keyword argument. Here is an example of a very simple cloud-config script that installs the AWS command line tools and writes a simple file to S3.

userdata = """#cloud-config

repo_update: true
repo_upgrade: all

packages:
 - s3cmd

runcmd:
 - echo 'Hello S3!' > /tmp/hello.txt
 - aws --region YOUR_REGION s3 cp /tmp/hello.txt s3://YOUR_BUCKET_NAME/hello.txt
"""

Be sure to replace YOUR_REGION and YOUR_BUCKET_NAME with your actual region and bucket name strings.

Putting it all together: hello world

Here's an example one-off python script that creates a t2.micro instance with all of the security settings we described, and runs the "hello world" userdata script from the previous section.

import os
import boto3

# note, in part 2 we will move these to a separate .env file so they
# aren't accidentally published in a public repository.
os.environ["AWS_ACCESS_KEY_ID"] = "YOUR SECRET"
os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_SECRET"
os.environ["AWS_DEFAULT_REGION"] = "YOUR_REGION"

userdata = """#cloud-config

repo_update: true
repo_upgrade: all

packages:
 - s3cmd

runcmd:
 - echo 'Hello S3!' > /tmp/hello.txt
 - aws --region YOUR_REGION s3 cp /tmp/hello.txt s3://YOUR_BUCKET/hello.txt
"""


ec2 = boto3.resource('ec2')
instances = ec2.create_instances(
    ImageId='ami-f5f41398',         # default Amazon linux
    InstanceType='t2.micro',
    KeyName='YOUR_SSH_KEY_NAME',
    MinCount=1,
    MaxCount=1,
    IamInstanceProfile={
        'Arn': 'YOUR_ARN_ID'
    },
    SecurityGroupIds=['YOUR_SECURITY_GROUP_NAME'],
    UserData=userdata
)

for instance in instances:
    print("Waiting until running...")
    instance.wait_until_running()
    instance.reload()
    print((instance.id, instance.state, instance.public_dns_name,
instance.public_ip_address))

Let's inspect the keyword arguments one by one.

  • ImageId: the name of the image that our EC2 image will use. In this example, 'ami-f5f41398' refers to the standard Amazon Linux AMI. A more complex deployment (see part 2) will involve a custom AMI with your algorithm's requirements pre-loaded.
  • InstanceType: the identifier of the machine you want to spin up. In this case t2.micro is the simplest free option.
  • KeyName: the name of your SSH key pair.
  • MinCount: the minimum number of instances you want to spin up.
  • MaxCount: the maximum number of instances you want to spin up.
  • IamInstanceProfile: a dictionary containing metadata about the IAM instance profile. In this case we're only passing the ARN identifier.
  • SecurityGroupIds: a list of security group names to be applied to the created instances.
  • UserData: a string containing a cloud-config script, to be run once when the instance is first launched.

Run the script above (after pip installing boto3) and then observe on your EC2 dashboard that the instance is running. Once it's finished launching (and running some initialization checks), check to make sure that your S3 bucket is populated with a hello world text file.

python test-aws.py

This prints as output

Waiting until running...
('i-0e51fa6a25ead432d', {'Code': 16, 'Name': 'running'}, 'ec2-107-23-255-83.compute-1.amazonaws.com', '107.23.255.83')

Before we terminate this instance, let's SSH into it. Recall where you saved your .pem file, note the public DNS in the output above, and run

ssh -i /path/to/key.pem ubuntu@ec2-107-23-255-83.compute-1.amazonaws.com

ubuntu is the default user for this AMI.

A more complicated invocation

From here, the remaining work involves changing the cloud-config script. For example, here is a cloud-config script which (if your IAM profile allows HTTP access), launches a PHP web server.

userdata = '''#cloud-config
repo_update: true
repo_upgrade: all

packages:
 - httpd24
 - php56
 - mysql55
 - server
 - php56-mysqlnd

runcmd:
 - service httpd start
 - chkconfig httpd on
 - groupadd www
 - [ sh, -c, "usermod -a -G www ec2-user" ]
 - [ sh, -c, "chown -R root:www /var/www" ]
 - chmod 2775 /var/www
 - [ find, /var/www, -type, d, -exec, chmod, 2775, {}, + ]
 - [ find, /var/www, -type, f, -exec, chmod, 0664, {}, + ]
 - [ sh, -c, 'echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php' ]
'''

You will notice that the commands in a cloud-config script are run by root, not by the ubuntu user. This has some important consequences. In particular, if you launch your EC2 instance with a custom AMI --- perhaps because you need a certain GPU library as we will shortly --- you need to make sure that root has the appropriate environment variables set. Perhaps the quickest way to do this is add them as export commands to the cloud-config script.

Uploading and downloading from S3

Putting and getting files on S3 is much simpler than launching EC2 instances. The following python snippet defines functions for uploading and downloading files from your S3 bucket using boto3.

def upload_to_s3(local_filename, s3_filename):
    s3 = boto3.client('s3')
    s3.upload_file(local_filename, 'YOUR_BUCKET_NAME', s3_filename)


def download_from_s3(local_filename, s3_filename):
    s3 = boto3.client('s3')
    s3.download_file('YOUR_BUCKET_NAME', s3_filename, local_filename)

A custom AMI

AWS has a publicly searchable list of AMIs for you to choose from. For example, there are many pre-existing deep learning AMIs. To find them, from the EC2 dashboard under "Images" click on "AMIs." Click the filter that says "Owned by me," and change it to "Public images." Then put in your search term.

Making a custom AMI allows you to save the preconfigured state of an EC2 instance so that you don't have to re-install libraries every time you launch an instance. The process for doing this is:

  1. Launch a base instance with the AMI of your choice.
  2. Install tools from the command line as you would normally.
  3. Exit SSH.
  4. From the EC2 Instances console, right click on your instance, click on "Image" and then "Create Image."
  5. Write down the AMI id for future use, or pull it from your "Owned by me" AMIs on the EC2 dashboard.

In part 2 we'll use a custom AMI ami-1ab24377 with Torch and cuDNN to enable deep learning on a GPU. This AMI also has a set of deep-learning models pre-downloaded. Note that AMIs are tied to a region, so to use this custom AMI you need to set your region to us-east-1.

If you make a custom AMI, there is one pitfall you should be aware of. When creating a custom AMI there's an option to attach various kinds of volumes to your instance. This specifies what sort of storage your AMI has access to. There is also a checkbox that tells AWS to delete the volume when the instance terminates. This is important because AWS charges you for volume usage, and EC2 creates a new volume for each instance you launch. Neglecting to delete unused volumes can be a costly oversight. A more sophisticated AWS endpoint might have a queue and coorindate the relationship between EC2 instances and volumes, starting and stopping instances instead of terminating them. Managing such a queue is beyond the scope of this tutorial.

Quick Deploy

The last bit of configuration needed is to register an API key with imgur. We'll use imgur to upload the output images, and send the user a url as the final product. You can register your application to get imgur API keys here. Note that you don't need to include a callback URL.

imgur registration

At this point, if you've configured AWS, installed 21, made a Heroku account, and registered for imgur API keys, then you have enough information to use the Heroku quick-deploy button at the open source repository. The rest of this tutorial will detail the internals of the django app.

Note that you won't be able to publish a quick-deployed app using the Heroku command (as explained in the Django Heroku tutorial unless you modify the manifest.yaml template to use your specific information.

Part 2: Django app for deep learning

In this part we'll build a simple Django app for handling requests and spinning up EC2 instances. If you're new to Django apps, see the 21 tutorial on writing and deploying Django apps with Heroku. The app will launch an EC2 instance that performs artistic style transfer using deep learning, and the code we'll use is based off of Justin Johnson's Torch implementation.

We're going to provide a rough overview of the endpoint. You can see all the details by browsing the git repository for this tutorial.

Configuration

Let's start by putting all of our secrets and configuration variables in a .env file in the Django project's base directory. I have left all of my secrets blank, and populated some defaults for local debugging purposes.

# in .env

DATABASE_URL=sqlite:///db.sqlite3

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1   # needed to use the custom AMI, ami-1ab24377

IMGUR_CLIENT_ID=
IMGUR_CLIENT_SECRET=

S3_BUCKET_NAME=
EC2_SSH_KEYPAIR_ID=
EC2_IAM_INSTANCE_PROFILE_ARN=
EC2_SECURITY_GROUP_NAME=

EC2_MAX_NUM_INSTANCES=1

HASHIDS_SALT=

TWO1_WALLET_MNEMONIC=
TWO1_USERNAME=

DEBUG=True  # set this to False when you actually deploy. 

We'll also be using hashids to generate tokens from our database ids. The Django/Heroku tutorial has details on how to load these environment variables into your Django app, as well as how to use hashids.

Payment required, models, views

Once this is set up, you can wrap any views you write with the @payment.required decorator as follows.

from django.core.exceptions import ValidationError
from rest_framework.decorators import api_view
from two1.bitserv.django import payment

@api_view(['POST'])
@payment.required(200000)
def buy(request):
    try:
        data = validate_buy_params(request.data)
    except ValidationError as error:
        return JsonResponse({"error": error.message}, status=400)

    return _execute_buy(data)

The required POST data parameters are:

  • content: A url to a jpg file to be used as the content image
  • style: A url to a jpg file to be used as the style image

The _execute_buy function creates a new instance of a simple Request Django model, detailed below.

from django.db import models
from django.utils import timezone


class Request(models.Model):
    '''
        A model representing a single request from a user.
    '''

    created = models.DateTimeField(default=timezone.now)

    '''
        A token given to the user, a hashid of the database id,
        which is also used to name files on S3, etc.
    '''
    token = models.CharField(max_length=100, null=True, default=None)

    '''
        The server filepath for output image to store temporarily between
        fetching from s3 and uploading to imgur.
    '''
    output_filepath = models.CharField(max_length=150, null=True)

    '''
        The name of the output file on S3.
    '''
    output_s3_filename = models.CharField(max_length=150, null=True)

    '''
        True when the token has successfully been redeemed, False otherwise.
    '''
    redeemed = models.BooleanField(default=False)

Then _execute_buy does the following:

  1. Creates a new request
  2. Generates a unique token from the database id
  3. Uses the token to generate unique filenames for the content, style, and outupt images
  4. Downloads the content and style images from the web and pushes them to S3
  5. Launches an EC2 instance with a dynamically generated cloud-config script
  6. Returns the token to the user
def _execute_buy(data):
    request = Request.objects.create()

    request.token = hasher.encode(request.id)
    filepath_dict = filepaths(request.token)

    request.output_filepath = filepath_dict[settings.OUTPUT_SUFFIX]
    request.output_s3_filename = os.path.split(request.output_filepath)[1]
    request.save()

    try:
        fetch_files(data, filepath_dict)
    except FileNotFoundError as e:
        return JsonResponse({"error": e.message}, status=404)

    try:
        aws.launch(filepath_dict, data)
    except Exception as e:
        return JsonResponse({"error": "Error with AWS: {}".format(str(e))}, status=500)

    return JsonResponse({"token": request.token}, status=200)

The aws.launch function is our previous script from Part 1 for launching instances using boto3. We put it in a separate module called aws to separate it from the view logic. The complete details can be found at this tutorial's github repository

The cloud-config script

The cloud config script involves setting special environment variables to point to the installed Torch and CUDA libraries. Then it

  1. Fetches the input files from S3
  2. Runs the algorithm
  3. Pushes the output back to S3
  4. Shuts itself down

Here's the userdata script. Note that we've left the filenames and parameters as python format-string arguments. So in a more complicated app, one could expose more parameters to the API. As a warning, there are limits to these parameters. For example, increasing the -image_size parameter drastically increases the amount of memory used on the EC2 instance. If it exceeds the maximum allowed memory, the instance will crash and the customer will never get their image.

USERDATA_TEMPLATE = """#cloud-config

runcmd:
 - export LD_LIBRARY_PATH=/home/ubuntu/torch-distro/install/lib:/usr/local/cuda/lib64:/home/ubuntu/cudnn/:$LD_LIBRARY_PATH
 - export PATH=/home/ubuntu/torch-distro/install/bin:/home/ubuntu/anaconda/bin:/usr/local/cuda/bin:$PATH
 - export DYLD_LIBRARY_PATH=/home/ubuntu/torch-distro/install/lib:$DYLD_LIBRARY_PATH
 - export PYTHONPATH=/home/ubuntu/caffe/python:$PYTHONPATH
 - export TH_RELEX_ROOT=/home/ubuntu/th-relation-extraction
 - export HOME=/home/ubuntu
 - cd /style-transfer-torch
 - aws --region us-east-1 s3 cp s3://{bucket}/{content} ./{content}
 - aws --region us-east-1 s3 cp s3://{bucket}/{style} ./{style}
 - th neural_style.lua -style_image {style} -content_image {content} -output_image {output} -gpu 0 -backend cudnn -cudnn_autotune -print_iter 50 -image_size 500 -num_iterations 500 -init image
 - aws --region us-east-1 s3 cp ./{output} s3://{bucket}/{output}
 - shutdown -h now
"""

In the above, note we used the region us-east-1 which would change if you're using a different region.

Further note that in making our custom AMI ami-1ab24377, we pre-cloned a git respository in the root directory called style-transfer-torch that has the needed files inside it to run the style transfer algorithm. If you're designing an algorithm that you maintain in a git repository, it may be reasonable to clone that git respository as part of the cloud-config script so that bug fixes are instantly deployed to your endpoint.

Redeeming a token

Now we can allow the user to redeem a token. The redeem API endpoint checks to see if the EC2 instance has pushed the desired output file to S3. If there's no such file, our API responds to the caller with "not done yet." If there is a file, it uploads that file to imgur, and returns an imgur url to the user, and marks the token as redeemed.

def validate_redeem_params(request):
    try:
        token = request.GET['token']
    except KeyError:
        raise ValidationError({
            "error_message":
            "'token' must be specified as a GET parameter"
        })

    return token

def _redeem(token):
    try:
        request = Request.objects.get(token=token)
        if request.redeemed:
            raise ValueError()

        try_download_output(request)
    except botocore.exceptions.ClientError as e:
        logger.error('Download from S3 failed with error: {}'.format(str(e)))
        return JsonResponse({'status': 'working', 'message': 'Not yet finished.'}, status=202)
    except ObjectDoesNotExist:
        logger.error('User requested token {} that does not exist'.format(token))
        return JsonResponse({'error': 'Invalid or redeemed token.'}, status=400)
    except ValueError:
        logger.error('User requested token {} that was already redeemed'.format(token))
        return JsonResponse({'error': 'Invalid or redeemed token.'}, status=400)

    imgur = pyimgur.Imgur(settings.IMGUR_CLIENT_ID)
    uploaded_image = imgur.upload_image(request.output_filepath, title='Style transfer output {}'.format(token))
    url = uploaded_image.link

    request.redeemed = True
    request.save()

    return JsonResponse({"status": "finished", "url": url, "message": "Thanks!"}, status=200)


@api_view(['GET'])
def redeem(request):
    try:
        token = validate_redeem_params(request)
    except ValidationError as error:
        return JsonResponse({"error": error.message}, status=400)

    return _redeem(token)

Notes and example usage

As per the Django/Heroku tutorial, you can deploy this endpoint to heroku. The github repository includes a quick-deploy button so you can quickly and easily test it out. Here is an example usage that styles Dorian Nakamoto as an ancient Roman mosaic.

Style image:

style transfer example

Content image:

style transfer example

Output image:

style transfer example

In these examples, replace APP_NAME with your deployed heroku app name.

Buy call (note the use of --maxprice):

21 buy "https://APP_NAME.herokuapp.com/buy" --data '{"style":"http://i.imgur.com/GEEYfD7.jpg", "content":"http://i.imgur.com/Go86JXN.jpg"}' --maxprice 175000

Buy result:

{
    "token": "Q8jXE"
}

Redeem call (too early):

21 buy "https://APP_NAME.herokuapp.com/redeem?token=Q8jXE"

Redeem result:

{
    "message": "Not yet finished.",
    "status": "working"
}

Redeem call (finished):

21 buy "https://APP_NAME.herokuapp.com/redeem?token=Q8jXE"

Redeem result:

{
    "message": "Thanks!",
    "status": "finished",
    "url": "http://i.imgur.com/cSiLvsQ.jpg"
}

How to send your Bitcoin to the Blockchain

Just as a reminder, you can send bitcoin mined or earned in your 21.co balance to the blockchain at any time by running 21 flush . A transaction will be created within 10 minutes, and you can view the transaction id with 21 log. Once the transaction has been confirmed, you can check the balance in your bitcoin wallet from the command line with wallet balance, and you can send bitcoin from your wallet to another address with wallet sendto $BITCOIN_ADDRESS --satoshis $SATOSHI_AMOUNT --use-unconfirmed. The --satoshis flag allows you to specify the amount in satoshis; without it the sendto amount is in BTC, but this behavior is deprecated and will be removed soon. The --use-unconfirmed flag ensures that you can send even if you have unconfirmed transactions in your wallet.


Ready to sell your endpoint? Go to slack.21.co

Ready to try out your bitcoin-payable server in the wild? Or simply want to browse and purchase from other bitcoin-enabled servers? Head over to the 21 Developer Community at slack.21.co to join the bitcoin machine-payable marketplace hosted on the 21 peer-to-peer network.