Day 4: OAuth Flows

August 08, 2016

On Friday I finished off the initial implementation of obtaining an OAuth token from GitHub. Their documentation makes it pretty straightforward, but for anyone unfamiliar with the OAuth2 flow, it goes a little something like this:

  1. The application generates an authentication URL for GitHub based on several parameters including the CLIENT_ID (obtained from the OAuth application on GitHub's side) and a redirect_url that users will be sent back to after authorizing.
  2. The user follows this URL which takes them to an authorization page on GitHub, where they can choose to authorize the application to use the scopes (essentially permissions) that it has requested. More on scopes in a moment.
  3. If the user authorizes the application, they are redirected back to the previously supplied redirect_url. GitHub passes a temporary code with this redirect call, which can temporarily be used to request a full access token for the user.
  4. Behind the scenes, the application makes a POST request to GitHub, with more client information and the temporary code. In return it receives the OAuth access token.
  5. The application can now use the access token to make authenticated calls on behalf of the user.

Steps 1 to 4 can be implemented in two fairly simple Django views. Let's go through the code for the initial generation of the authorization URL first. Note: the application code is now on GitHub so you can follow along there!

Generating an authorization URL

# models.py
from django.db import models
from django.contrib.postgres.fields import ArrayField
from django.utils import timezone

import uuid

class OauthToken(models.Model):
    """
    Stores Ouath Credentials for contacting github.
    """
    token = models.CharField(max_length=255, blank=True, null=True)
    scopes = ArrayField(base_field=models.CharField(max_length=255))
    state = models.CharField(max_length=255)
    timestamp = models.DateTimeField(default=timezone.now)

    def save(self, *args, **kwargs):
        self.state = self.state or uuid.uuid4()
        super(OauthToken, self).save(*args, **kwargs)

This is our database model model to hold data about the OAuth token. The token itself will live in the token field, and the scopes that the token has access to will live in a fancy Postgres ArrayField. The state field allows us to preserve state between generating the authorization URL and receiving the callback - after this it is unused. Finally, we have a timestamp because if there's one thing I've learned from working with Django for years, it's that everything is easier when your models are timestamped.

# views.py
from django.conf import settings
from django.shortcuts import get_object_or_404, reverse

import json
import requests
from rest_framework.views import APIView
from rest_framework.response import Response

from .models import OauthToken, User

OAUTH_URL = "https://github.com/login/oauth/"
API_URL = "https://api.github.com/"

class GitHubAuthorize(APIView):

    def get(self, request):

        # build an empty OauthToken
        desired_scopes = [
            'user:email',
            'write:repo_hook',
            'public_repo'
        ]

        token = OauthToken.objects.create(scopes=desired_scopes)

        # construct oauth URL for GitHub
        oauth_url = self._get_oauth_url(request, token)

        data = {'oauth_url': oauth_url}

        return Response({'data': data})

    def _get_oauth_url(self, request, token):

        redirect_uri = request.build_absolute_uri(
            reverse('oauth:callback'))

        params = {
            'state': token.state,
            'client_id': settings.GITHUB_CLIENT_ID,
            'scope': ' '.join(token.scopes),
            'redirect_uri': redirect_uri,
        }

        uri = OAUTH_URL + "authorize"

        oauth_url = self._add_params(uri, params)
        return oauth_url

    def _add_params(self, uri, params):
        uri_with_params = uri + '?' + '&'.join(['{0}={1}'.format(k, v) for k, v in params.items()])
        return uri_with_params

Our view is a subclass of DRF's APIView - this allows us to easily generate JSON responses and do a bunch of other awesome things like have pluggable authentication and permissions. In the main get method, we build a list of desired scopes to request from GitHub. We then make an empty OauthToken, with a randomly generated state field, to make sure that we can check that any callbacks we receive were actually initiated on our site. When we receive a callback, we will check for the existence of an OauthToken object with the same state passed in the callback. If one exists, we can be confident that it's a real authorization response, not a spoofed call from another site.

Our _get_oauth_url method just adds this state, our application's client ID, the desired scopes and our redirect URI as URL parameters to the base GitHub OAuth URL, https://github.com/login/oauth/.

Since the application will eventually have a separate client-side app, rather than redirecting the user directly to the authorization URL, we're returning it in a JSON response. The client application will then use this to redirect the user, but for the purposes of testing we can just get the resulting URL and plug it into our browser to authorize our application.

authorize screen

Success!

Recieving the OAuth callback and obtaining an access token

Once the user has authorized our GitHub app, we need to listen for a callback from GitHub and exchange the temporary code we receive for a full access token.

#models.py
class User(models.Model):
    """
    A basic user model that is used for authentication.
    """

    email = models.EmailField(unique=True)
    github_id = models.IntegerField()
    token = models.ForeignKey(OauthToken)

This model will hold basic user information. It would be totally fine to keep the user and OAuth token in a single model - I believe TravisCI does this - but I chose this implementation in order not to have to create empty user models in order to verify the legitimacy of a callback based on the state field as described above. A more mature implementation would probably handle this state flow separately, using expiring keys in Redis or something similar, and only create a unified token-user model once the callback had been confirmed valid. This is always something we can build later if we so desire.

# views.py
class GitHubCallback(APIView):

    def get(self, request):

        state = request.query_params.get('state')
        code = request.query_params.get('code')

        # Check that this is a legitimate callback for a request made from the site
        token = get_object_or_404(OauthToken, state=state)

        try:
            data = self._get_access_token(code)
        except requests.HTTPError as e:
            return self._generate_error_response(e)

        if set(token.scopes) == set(data['scope'].split(',')):
            token.token = data.get('access_token')
            token.save()

            user = self._get_or_create_user(token=token)

            return Response({'data': {
                'user_email': user.email,
                'access_token': token.token
            }})

        else:
            return Response({'errors': [
                {'status': 403, 'detail': "User has not authorized the correct scopes"}
            ]})

From our callback, we get the state and the code out of the URL parameters, and look for the OauthToken instance corresponding to the state to make sure it's the real deal. Once we find it, we exchange the code for the access_token. Let's look at that in more detail:

def _get_access_token(self, code):

     uri = OAUTH_URL + "access_token"
     params = {
         'code': code,
         'client_id': settings.GITHUB_CLIENT_ID,
         'client_secret': settings.GITHUB_CLIENT_SECRET,
     }
     headers = {'Accept': 'application/json'}

     response = requests.post(uri, headers=headers, params=params)
     response.raise_for_status()

     return response.json()

This function is very simple: we send the code to GitHub along with our application's client ID and secret, and get back a response including the access token and the scopes that the user has authorized.

It's probably a good idea to check at this point that the user-authorized scopes are the ones we requested - without those we can't do the things the service needs to do.

if set(token.scopes) == set(data['scope'].split(',')):
    token.token = data.get('access_token')
    token.save()

Currently, we'll just fail if the scopes are incorrect. In future, we could warn the user or redirect them back to the authorization page. If the scopes are right, we save the token object with the access token we got back. Finally, since we successfully have a token, we create a user to store GitHub-specific data like email and ID.

def _get_or_create_user(self, token):

    user_url = API_URL + 'user'

    headers = {'Authorization': 'token {}'.format(token.token)}

    response = requests.get(user_url, headers=headers)
    data = response.json()

    try:

        user = User.objects.get(email=data['email'], github_id=data['id'])
        old_token = user.token
        user.token = token
        user.save()
        old_token.delete()

    except User.DoesNotExist:

        user = User.objects.create(
            email=data['email'], github_id=data['id'], token=token)

    return user

Here we're calling GitHub's user API to get details about the authorized user. If the user has been through the authorization flow before - e.g. if our application's authentication has expired because they haven't visited the site for a long time - we don't want to recreate a new user. In this case we'll just delete the old token and replace it with the new one. If they're new to the flow, we'll make a user object for them and attach the OAuth token.

Later on we'll use this user for our own authentication purposes - we want to remember that the user has authorized with GitHub so they don't have to go through the full flow every time they visit the site.

What's next?

In the current implementation, we're just spitting out the user's email and access token at the end of the callback function, to prove that authorization has been successful. In reality, though, we want to generate our own authentication token for the user that they can use to take actions on our service, and probably redirect them to some client application with this token as a header. We'll use JWTs for our authentication, so tune in next time for a discussion of generating JWTs and making authenticated calls with them!

As always, if you have any questions or criticism, hit up the comments!