Django Blog Archive

Selenium Testing File Uploads in Django Admin

The Django framework version 1.4 added much better integration with Selenium for in-browser functional testing. This made Test-Driven Development an even more obvious decision for our new Liquid Galaxy Content Management System. This went very well until we needed to test file uploads in the Django admin interface.

A browser's file upload control has some unique security concerns that prevent JavaScript from setting its value. Trying to do so may raise INVALID_STATE_ERR: DOM Exception 11. Selenium's WebDriver may sometimes send keystrokes directly into the input element, but this did not work for me within Django's admin interface.

To work around this limitation, Ryan Kelly developed a Middleware to emulate successful file uploads for automated testing. This middleware inserts additional hidden fields into any forms sent to the client. Setting their value causes a file upload to happen locally on the server. (I used a slightly newer version of this Middleware from another project.)

However, Selenium intentionally will not interact with hidden elements. To work around this, we must send JavaScript to be executed directly in the browser using WebDriver's execute_script method. You can see an example of this here.

        self.browser.execute_script("document.getElementsByName('fakefile_storage')[0].value='placemark_end_point.kml'")

This is a lot of hoops to jump through, but now we have functional tests for file uploads and their post-upload processing. Hopefully the Selenium or Django projects can develop a better-supported method for file upload testing.

Making SSL Work with Django Behind an Apache Reverse Proxy

Bouncing Admin Logins

We have a Django application that runs on Gunicorn behind an Apache reverse proxy server. I was asked to look into a strange issue with it: After a successful login to the admin interface, the browser was re-directed to the http (non-SSL) version of the interface.

After some googling and investigation I determined the issue was likely due to our specific server arrangement. Although the login requests were made over https, the requests proxied by Apache to Gunicorn used http (securely on the same host). Checking the Apache SSL error logs quickly affirmed this suspicion. I described the issue in the #django channel on freenode IRC and received some assistance from Django core developer Carl Meyer. As of Django 1.4 there was a new setting Carl had developed to handle this particular scenario.

Enter SECURE_PROXY_SSL_HEADER

The documentation for the SECURE_PROXY_SSL_HEADER variable describes how to configure it for your project. I added the following to the settings.py config file:

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

Because this setting tells Django to trust the X-Forwarded-Proto header coming from the proxy (Apache) there are security concerns which must be addressed. The details are described in the Django documentation and this is the Apache configuration I ended up with:

# strip the X-Forwarded-Proto header from incoming requests
RequestHeader unset X-Forwarded-Proto

# set the header for requests using HTTPS
RequestHeader set X-Forwarded-Proto https env=HTTPS

With SECURITY_PROXY_SSL_HEADER in place and the Apache configuration updated, logins to the admin site began to work correctly.

This is standard practice for web applications that reside behind an HTTP reverse proxy, but if the application was initially set up using only plain HTTP, when HTTPS is later added, it can be easy to be confused and overlook this part of the setup.

Dowloading CSV File With From Django Admin

Django has a very nice admin panel. The admin panel is highly extensible and there can be performed really cool enhancements. One of such things is a custom action.

For the purpose of this this article I’ve created a simple Django project with a simple application containing only one model. The file models.py looks like this:

from django.db import models
from django.contrib import admin

class Stat(models.Model):
    code = models.CharField(max_length=100)
    country = models.CharField(max_length=100)
    ip = models.CharField(max_length=100)
    url = models.CharField(max_length=100)
    count = models.IntegerField()

class StatAdmin(admin.ModelAdmin):
    list_display = ('code', 'country', 'ip', 'url', 'count')

admin.site.register(Stat, StatAdmin)

I’ve also added a couple of rows in the database table for this model. The admin site for this model looks like this:

Now I want to be able to select some rows and download a CSV file right from the Django admin panel. The file should contain only the information about selected rows.

This can be done really easy with the admin actions mechanism. Over the table with rows there is the actions menu. There is one default action, it is "Delete selected stats". To use the action you need to select the rows, select the action from the combo box and press the OK button.

I will add another action there, which will be named "Download CSV file for selected stats".

Add the action.

First of all I will add the custom action. For this I will enhance the StatAdmin class with the field actions, and add the method, called when someone wants to run this action (all changes from the previous version are highlighted):

class StatAdmin(admin.ModelAdmin):
    actions = ['download_csv']
    list_display = ('code', 'country', 'ip', 'url', 'count')
    def download_csv(self, request, queryset):
        None
    download_csv.short_description = "Download CSV file for selected stats."

In the admin panel for the stats model you can notice that there is the new action. The above code doesn’t do anything useful, so let’s generate the CSV file.

The whole idea is to generate the CSV file, don’t use any disk, do it in memory only and return to the user without any redirection to other page (the file should download automatically after pushing the OK button).

Do it in small steps:

Generate the CSV

For this I will use the CSV module from Python’s standard library and the function now looks like this:

def download_csv(self, request, queryset):
    import csv
    f = open('some.csv', 'wb')
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])
    for s in queryset:
        writer.writerow([s.code, s.country, s.ip, s.url, s.count])

After selecting two rows and running the action, it created a file some.csv in the main project directory with the following content.

code,country,ip,url,count
B,BB,BBB,BBBB,22
C,CC,CCC,CCCC,33
That’s OK, generating the CSV file works, however it shouldn’t be stored on the disk.

Return the file directly into the browser.

I want to send the file to the client right after clicking on the OK button. This is fairly easy, the whole magic is to use proper HTTP headers. I will modify the method to look like this:


def download_csv(self, request, queryset):
    import csv
    from django.http import HttpResponse

    f = open('some.csv', 'wb')
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])

    for s in queryset:
        writer.writerow([s.code, s.country, s.ip, s.url, s.count])

    f.close()

    f = open('some.csv', 'r')
    response = HttpResponse(f, content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=stat-info.csv'
    return response

So the main changes are: - closed the file and reopen it for reading - added headers for proper content type and file name.

The result is that when someone clicks on the OK button, the browser automatically starts downloading the stat-info.csv file.

Don’t use disk.

The only thing left: create the file in memory only. For this I will use StringIO module. It is a nice module implementing exactly the same interface as the file, so I can use it instead the file. StringIO operates only on memory without any disk operations.

def download_csv(self, request, queryset):
    import csv
    from django.http import HttpResponse
    import StringIO

    f = StringIO.StringIO()
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])

    for s in queryset:
        writer.writerow([s.code, s.country, s.ip, s.url, s.count])

    f.seek(0)
    response = HttpResponse(f, content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=stat-info.csv'
    return response

As you can see the changes are: - added import StringIO - changed opening file to creating new StringIO object - there is no reopening the file, only seek to set the marker at the beginning of the file

Everything is finished now. There is a new action in the admin panel which generates a new CSV file with information about chosen rows and it doesn’t do any browser redirection.

Django and Virtual Environments

When you have to work with a bunch of different Python applications, the usual problem is that you have to deal with plenty of different packages in different versions. Each application needs its own set of libraries. Usually the versions of the libraries vary between the applications.

To solve all the problems you could create Python virtual environments. There is a great tool: virtualenv. It can create virtual environments for Python. Using it is not too nice. However there is a wrapper to it, called virtualenvwrapper. It wraps all the virtualenv commands into a couple of shell commands.

Let's assume that I need to work on two applications written in Django 1.2 and Django 1.3. Each of the applications needs different set of packages in different versions. I will create two virtual environments.

Installing virtualenvwrapper on Ubuntu is pretty easy:

$ sudo apt-get install virtualenvwrapper

After the installation there is a couple of new commands. The basic one is: mkvirtualenv which creates a new environment. Let’s create one.

$ mkvirtualenv django_demo_12

This command automatically switches to the new environment, so you might notice that the prompt changed. The prompt always starts with the name of current virtual environment.

Let’s create another one, called django_demo_13 (to use Django 1.3 there).

(django_demo_12)$ mkvirtualenv django_demo_13

The list of environments is printed by the command workon, when called without arguments.

$ workon
django_demo_12
django_demo_13

As you can see, there are two environments ready to use. You can pass the name of the virtual environment as parameter to the workon command. Now let’s install Django 1.2 on the environment django_demo_12.

First of all switch to the new environment:

$ workon django_demo_12

Now the prompt changed, so you can always be sure which Python virtual environment you are using.

(django_demo_12)$

Now Django should be installed. There is a couple of ways to install it. The one I prefer is to create a text file with names and versions of all needed packages. This file will be named requirements.txt and will contain only this one line so far (other packages will be added later):

Django==1.2.7

To install the packages listed in the file, I will use the command "pip install -r requirements.txt":

(django_demo_12)$ pip install -r requirements.txt
Downloading/unpacking Django==1.2.7 (from -r requirements.txt (line 1))
Downloading Django-1.2.7.tar.gz (6.4Mb): 6.4Mb downloaded
Running setup.py egg_info for package Django

Installing collected packages: Django
Running setup.py install for Django
changing mode of build/scripts-2.7/django-admin.py from 664 to 775

changing mode of /home/szymon/.virtualenvs/django_demo_12/bin/django-admin.py to 775
Successfully installed Django
Cleaning up...

Now I can check which Django version is installed:

(django_demo_12)$ django-admin.py --version
1.2.7

Now I will create a standard Django project:

(django_demo_12)$ django-admin.py startproject django_demo_12

The only additional thing here is to move the requirements.txt file info the Django project:

(django_demo_12)$ mv requirements.txt django_demo_12

To create application using Django 1.3 the steps are similar. The first thing is to switch to another virtual environment:

(django_demo_12)$ workon django_demo_13

From this moment it will be almost the same as in the previous environment, with the change that the requirements file should contain:

Django==1.3.1

The commands are:

(django_demo_13)$ pip install -r requirements.txt
(django_demo_13)$ django-admin.py startproject django_demo_13
(django_demo_13)$ mv requirements.txt django_demo_13

So now there are two different Python environments, totally separated from each other. When I install something in one of them, it is not installed in the other, so I can have different packages for different Django versions.

The best way to install a package here is to update the requirements.txt file, and run the "pip install -r requirements.txt" once again. Later it will be easier to give the whole code to another programmer, who then could run the command on his computer and it will automatically install all needed packages (each in exactly needed version).

There is one simple command left. Sometimes you just want to remove the virtual environment from the path and use standard python libraries installed in the system. It can be done using this command:

(django_demo_13)$ deactivate

Red Hat SELinux policy for mod_wsgi

Using SELinux, you can safely grant a process only the permissions it needs to perform its function, and no more. Linux distributions provide policies to enforce these limits on most software they package, but many aren't covered. We've made allowances for mod_wsgi on RHEL and CentOS 5 by extending Apache httpd's SELinux policy.

It seems the SELinux policy for Apache httpd is twice as large as any other package's. The folks at Red Hat have put a lot of work into making sure that attackers who manage to exploit httpd can't break out to the rest of your system, while still allowing the flexibility to serve most applications. Consult the httpd_selinux man page if messages in audit.log coincide with your error.

File Contexts

If you've created files and/or directories in /etc/httpd, make sure they have the proper file contexts so the daemon can read them:

  # restorecon -vR /etc/httpd

httpd can only serve files with an explicitly allowed file context. Configure the context of files and directories within your production code base using the semanage command:

  # semanage fcontext --add --ftype -- --type httpd_sys_content_t "/home/projectname/live(/.*)?"
  # semanage fcontext --add --ftype -d --type httpd_sys_content_t "/home/projectname/live(/.*)?"
  # restorecon -vR /home/projectname/live

View file contexts with ls -Z. Changes should be generally accomplished with semanage and restorecon -vR.

Booleans

The httpd policy provides several boolean options for easy run-time configuration:

  • httpd_can_network_connect - Allows httpd to make network connections, including the local ones you'll be making to a database
  • httpd_enable_homedirs - Allows httpd to access /home/

Booleans are persistently set using the setsebool command with the -P flag:

  # setsebool -P httpd_can_network_connect on

WSGI Socket

When running in daemon mode, httpd and the mod_wsgi daemon communicate via a UNIX socket file. This should usually have a context of httpd_var_run_t. The standard Red Hat SELinux policy includes an entry for /var/run/wsgi.* to use this context, so it makes sense to put the socket there using the WSGISocketPrefix directive within your httpd configuration:

  WSGISocketPrefix run/wsgi

(Note that run/wsgi translates to /etc/httpd/run/wsgi which is symlinked to /var/run/wsgi.)

If socket communication fails, httpd returns a 503 "Temporarily Unavailable" error response.

SELinux Policy Module

In the course of our testing SELinux denials like the following appeared:

  host=example.com type=AVC msg=audit(1262803154.315:1851): avc:  denied  { execmem } for  pid=5337 comm="httpd" scontext=root:system_r:httpd_t:s0 tcontext=root:system_r:httpd_t:s0 tclass=process

Unusual behavior like this is usually best allowed by creating application-specific SELinux policy modules. If you cannot resolve these AVC errors by manipulating file contexts or booleans, collect all the errors into a single file and feed that into the audit2allow utility:

  # yum install policycoreutils
  # mkdir ~/tmp  # if this doesn't exist already
  # audit2allow --module wsgi < ~/tmp/pile_of_auditd_output > ~/tmp/wsgi.te

This will output source for a new policy module. You might review the .te file before compiling. Ours looks like this:

module wsgi 1.0;

require {
      type httpd_t;
      class process execmem;
}

#============= httpd_t ==============
allow httpd_t self:process execmem;

Compile this source into a new policy module and package it:

  # checkmodule -M -m -o ~/tmp/wsgi.mod ~/tmp/wsgi.te
  # semodule_package --outfile ~/tmp/wsgi.pp --module ~/tmp/wsgi.mod

Once created, the module may be installed permanently into any compatible system's SELinux configuration:

  # semodule --install ~/tmp/wsgi.pp

There's plenty of room for improvement here. The file contexts we assigned with semanage should be defined in a .fc source file and included within the policy module. And creating a new context just for the WSGI daemon to transition into would restrict it further, allowing only a subset of Apache httpd's abilities. Writing your own policy like this allows you much finer tuning of your processes' limits, while allowing their needed functionality.