Tag Archives: fault-tolerant

HOWTO: Deploy a fault tolerant Django app on AWS – Part 2: Moving static and media files to S3

In the last article, I discussed our attempt to remove points of failure in our infrastructure, and increase redundancy. We moved our single database instance running locally to RDS where fault tolerance is built-in through their multi-zone offering.

In this article, I’ll continue this journey by moving our Django static and media files from local file system to S3. Static files are files in the [app]/static folder where typically Javascript, CSS, static images and 3rd party Javascript libraries are stored. Media files that are user generated, through the use of FileField and ImageField in Django model, e.g. profile picture of a user, or a photo of an item. By default, when you created a Django application using the standard “django-admin.py startproject”, all the media files are stored in the [app]/media folder and static files in the [app]/static folders. The locations are controlled by the following parameters in settings.py:


# Absolute filesystem path to the directory that will hold user-uploaded files.
# Example: "/home/media/media.lawrence.com/media/"
MEDIA_ROOT = ''

# URL that handles the media served from MEDIA_ROOT. Make sure to use a
# trailing slash.
# Examples: "http://media.lawrence.com/media/", "http://example.com/media/"
MEDIA_URL = ''

# Absolute path to the directory static files should be collected to.
# Don't put anything in this directory yourself; store your static files
# in apps' "static/" subdirectories and in STATICFILES_DIRS.
# Example: "/home/media/media.lawrence.com/static/"
STATIC_ROOT = ''

# URL prefix for static files.
# Example: "http://media.lawrence.com/static/"
STATIC_URL = '/static/'

So, why do we need to move these files out of the EC2 local file system? It’s a pre-requisite to spinning up multiple EC2 instances that host the Django application. Specifically, we can’t have media files sitting in two locations. For example, when a user updates his or her profile picture, the POST request goes to one server and hence the new image would be stored in that server’s local file system, which is bad because the other app server won’t have access to it (unless you setup some shared folder between the instances – which is what’s typically done before Jeff Bezos gave us S3). By moving the static and media files to S3, both servers will be using the same S3 end-points to store and retrieve these files. Another HUGE plus is that the web servers (apache or nginx) don’t have to handle these static file requests anymore, and the disk and network load on the web servers will be drastically reduced.

Enough talking. First thing’s first. We need to download and install django-storages and boto.


pip install django-storages boto

Now, create a S3 bucket. This part is easy. Log into AWS console, click over to S3 and click on Create Bucket. Give it a name. For this example, we’ll use “spotivate”. All our static and media files be accessed through http://spotivate.s3.amazonaws.com/static/... and http://spotivate.s3.amazonaws.com/media/... respectively.

Also, we need to get the AWS Key and Secret which boto needs to access S3. You can find that from your AWS Security Credentials page.

Now we have all the info to change Django settings. The instructions here are loosely based on various articles I’ve read, but Phil Gyford’s article has been most helpful. Following his instructions, I first created spotivate/s3utils.py with the following content:


from storages.backends.s3boto import S3BotoStorage

StaticS3BotoStorage = lambda: S3BotoStorage(location='static')
MediaS3BotoStorage = lambda: S3BotoStorage(location='media')

Then, in settings.py, I added storages as one of the INSTALLED_APPS and a bunch of other variables that tells Django where to put and read media and static files:


INSTALLED_APPS = (
	...
	...
    'storages'
)

...
...

###################################
# s3 storage
###################################

DEFAULT_FILE_STORAGE = 'spotivate.s3utils.MediaS3BotoStorage' 
STATICFILES_STORAGE = 'spotivate.s3utils.StaticS3BotoStorage' 

AWS_ACCESS_KEY_ID="xxxxxxxxxx"
AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxx"
AWS_STORAGE_BUCKET_NAME = 'spotivate'

S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
STATIC_DIRECTORY = '/static/'
MEDIA_DIRECTORY = '/media/'
STATIC_URL = S3_URL + STATIC_DIRECTORY
MEDIA_URL = S3_URL + MEDIA_DIRECTORY

Voila. We are almost done. To upload all the static files to S3, run the following command:


python manage.py collectstatic

This will copy all the files in your current static folder to S3. What about media files? We need to upload that at least once to S3. Why only once? Because after the settings above is deployed, users who update their profile pics will be posted to S3. I found a great python package call boto-rsync that does the job beautifully.


pip install boto_rsync
boto-rsync media s3://spotivate/media -a [AWS_ACCESS_KEY_ID] -s [AWS_SECRET_ACCESS_KEY]

Verify in AWS console that all static and media files have indeed been copied to S3. Deploy the server, and hit a page. You should see that all references to Javascript, CSS and media files all point to S3.

It actually didn’t turn out so easy for me the first time around. I found that many CSS are still served from local file system. After looking at the template, I realized that I had this in the template:


<link href="/static/web/bootstrap230/css/bootstrap.css" rel="stylesheet" type="text/css" charset="utf-8">
<link href="/static/web/jcarousel/css/style.css" rel="stylesheet" type="text/css" charset="utf-8">
<link href="/static/web/css/spotivate_new.css" rel="stylesheet" type="text/css" charset="utf-8">

I am not using the Django “staticfiles” functionality properly. I had effectively hard-coded the static path, when I should be using the static template tag instead. The above line should be changed to:


{% load staticfiles %}
...
...
<link href="{% static "web/bootstrap230/css/bootstrap.css" %}" rel="stylesheet" type="text/css" charset="utf-8">
<link href="{% static "web/jcarousel/css/style.css" %}" rel="stylesheet" type="text/css" charset="utf-8">
<link href="{% static "web/css/spotivate_new.css" %}" rel="stylesheet" type="text/css" charset="utf-8">

The server is now functioning properly, but we are not done yet. What if we need to modify Javascript? How do changes get copied to S3 during deployment? This doc provides good instructions on this topic.

Now, with the static and media files moved over the S3, and database moved over to RDS, I’ve effectively remove all state from app server. Now I can spin up another EC2 instance, drop my code there and hence spreading all the traffic to two servers. If one goes down, we are still in business! And did I mention that the page loads a lot faster too?

HOWTO: Deploy a fault tolerant Django app on AWS – Part 1: Migrate local MySQL to AWS RDS

For a while, Spotivate was running on a single EC2 instance. Everything was in it — MySQL, Django, static files, etc. Yes, we know this is a terrible setup. Single point of failure, bad performance, etc. Here comes the excuses. We had better things to do, like customer development, sales, design, product development, etc. We had no time for ops! Plus, our traffic wasn’t really that high especially in the beginning. Our CPU / IO load was low. And we knew we can fix things fairly easily. Then one day, our EC2 instance went down for half an hour. Ooops! Called AWS support. They had a disk failure. Our last snapshot was a day old. So our site was down this whole time.

We figured we had to do it right. And EC2 makes it super easy. Our goal:

  • Remove all single points of failure, thus making the system fully fault tolerant.
  • As a result, the response time should go up, especially when under load.

Here’s the plan:

In this article, I’ll talk about the steps we took to move our MySQL to RDS.

If you don’t know what RDS is, read more about it here. Basically it’s AWS’s version of database server. RDS comes loaded with features. Here’s summary of what’s relevant:

  • Easy to deploy via the Management Console or command line.
  • Automatic backup (you get to choose how many days and when).
  • Multi-availability zone deployment means AWS automatically creates a primary DB instance and synchronously replicates the data to a standby instance in a different Availability Zone, thereby removing this as a single point of failure.
  • Replication that allows you to create read-only replicas. This is especially valuable for Spotivate, since our personalized email server put a heavy load on the DB. By having this, the performance of our website won’t be affected while we send out our weekly emails.

Well, let’s get on with it.

Step 1: Goto your management console and select RDS

Launch Database Instance

 

Step 2: Find a database server that fits your bill. In our case, MySQL.

Select database type

 

Step 3: Here’s where you pick the MySQL version and the instance size.

RDS Step 3

Multi-AZ Deployment Select “Yes” which creates a standby instance in a different AZ. That’s the whole point of this article, right?
Allocated Storage Choose a storage size that’s appropriate. Go small, as you can easily upgrade later with minimal down time. Generally, estimate enough for 3 months down the road.
DB Instance Identifier This is just the prefix to the public DNS.
Master Username Your database user name, typically “root”
Master Password Your database root user password

 

Step 4: Here you specify the database name, port, etc.

You also get to create (or assign) a database security group for this database. This is a little different from the EC2 security group. For database security group, you assign which EC2 security group to use. And any EC2 instance that belongs to that EC2 security group has access to the database. By default, everything else is turned off including ping. For more info, visit here.

RDS Step 4

 

Step 5: Backup Settings

Here, you specify the backup retention period, and when to backup. Make sure your backup window and maintenance window don’t overlap.

RDS Step 5

 

Step 6: That’s it. Review and Launch.

RDS Step 6

 

Step 7: Test it out.

After the DB has been launched (takes several minutes – enough time for coffee), you can find the public DNS from the detail page. This machine is accessible externally and within EC2. However, the security group by default prohibits any external access to the database server. Only EC2 instances that belong to the security group have access. From my web server, I can use my typical “mysql” command to connect to the new RDS instance.

RDS Step 7

 

Step 8: Import.

Our database is fairly small, so we can just dump the database and pipe it to the new instance. Here’s a fun command that you can use (make sure you stop your web server first to avoid consistency issue).

mysqldump [your current db] | mysql --host=[rds host name] --user=root --password [root password]

That’s it! All you need to do now is change your Django settings to use the new database instance. Bring down your local MySQL and restart your Django server to see if everything is running properly. If so, change chkconfig to keep the local MySQL from restarting.

Next time, I’ll talk about the migration of our static files to S3.