Notes on S3 backed File Downloads

Uclusion recently introduced arbitrary file uploads to our Workspaces or Stories (we previously allowed only images), and there are a few implementation notes we want to share with you.

Firstly, browsers do not always provide a content type when uploading a file. If your backend demands content types the code something like the below is needed when asking for a presigned post

export function uploadFileToS3 (marketId, file) {
  const clientPromise = _.isEmpty(marketId) ? getAccountClient() : getMarketClient(marketId);
  const { type, size, name } = file;
  // if we don't have a content type, use generic octet stream
  const usedType = _.isEmpty(type)? 'application/octet-stream' : type;
  return client.getFileUploadData(usedType, size, name))
    .then((data) => {
      const { metadata, presigned_post } = data;
      const { url, fields } = presigned_post; [See production file with this code](https://github.com/Uclusion/uclusion_web_ui/blob/master/src/api/files.js). Second, unless your served JS or HTML is coming from the same domain as your CDN, browsers will happily ignore the [download](https://www.w3schools.com/TAGS/att_a_download.asp) attribute of S3 links. They will either open it in a new tab with their viewer, or, for security reasons, use the S3 bucket’s file’s name. This means that if you machine generate paths such as [https://dev.imagecdn.uclusion.com/8432f156-f452-4031-8e7d-728deb25f7cd/35fed849-1b69-4f05-9cea-99f57d5f71e9](https://dev.imagecdn.uclusion.com/8432f156-f452-4031-8e7d-728deb25f7cd/35fed849-1b69-4f05-9cea-99f57d5f71e9) the user will end up with a file of [35fed849–1b69–4f05–9cea-99f57d5f71e9](https://dev.imagecdn.uclusion.com/8432f156-f452-4031-8e7d-728deb25f7cd/35fed849-1b69-4f05-9cea-99f57d5f71e9).

To avoid this, set your content distribution on your presigned post. This won’t effect the downloaded path, but is a method of telling the browser what you want the downloaded filename to be. Here’s code that does that, and also attempts to set the extension on the path appropriately:

def post_validation_function(event, data, context, validation_context):
    content_length = data['content_length']
    original_name = data.get('original_name', None)
    content_type = data['content_type']
    file_path = str(uuid.uuid4())
    # See if they provided the original name,
    # and if so use it, and try to put a good extension on
    fields = {"Content-Type": content_type}
    if original_name is not None:
        fields["Content-Disposition"] = 'attachment; filename="' + original_name + '"'
    extension = guess_file_extension(content_type)
    if extension is not None:
        file_path = file_path + extension

    used_id = validation_context['market_id'] if validation_context['market_id'] is not None else validation_context['account_id']
    path = used_id + '/' + file_path
    return create_post(path, fields, content_type, content_length, bucket)


def create_post(path, fields, content_type, content_length,  bucket):
    expiration = 60  60  24  # Upload must occur within 24 hours
    conditions = [
        ['content-length-range', content_length, content_length],
        {'Content-Type': fields['Content-Type']},
        {'Content-Disposition': fields['Content-Disposition']}
    ]
    post = s3_client.generate_presigned_post(bucket,
                                             path,
                                             Fields=fields,
                                             Conditions=conditions,
                                             ExpiresIn=expiration)
    return {
        'metadata': {
            'path': path,
            'content_length': content_length,
            'content_type': content_type
        },
        'presigned_post': post
    }

Notice that I set the constraints to include all Fields. If you don’t do that your users will get Access Denied whenever they try to upload.

If you don’t mind the users getting generated names, then I would at least attempt to set the file extensions properly. The code above invokes the following function to do so, and extension setting is one of the reasons why we demand content-type to be passed in when doing a presigned post.

import mimetypes

# Constants to give the file upload some additional information on extensions, especially on MS mimetypes
#
MS_TYPES = {
    'application/msword': '.doc',
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document': '.docx',
    'application/mspowerpoint': '.ppt',
    'application/vnd.openxmlformats-officedocument.presentationml.presentation': '.pptx',
    'application/vnd.openxmlformats-officedocument.presentationml.template': '.potx',
    'application/msexcel': '.xls',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': '.xlsx',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.template': '.xltx',
    'application/msproject': '.mpp',
}


def guess_file_extension(mime_type):
    lowercase = mime_type.lower()
    system = mimetypes.guess_extension(lowercase)
    if system is not None:
        return system
    # the system doesn't know, try our MS map
    return MS_TYPES.get(lowercase, None)

Ben Follis
Ben Follis Co-Founder of Uclusion