I just posted a snippet for a Django file storage backend that transparently compresses files using the gzip python module. Nothing fancy, but it gets the job done, with a few caveats:

  • Since the data returned by GzipFile.read is not necessarily of the length you requested, File.chunks would sometimes leave off the last bits. I wrote DjangoGzipFile to return just a single decompressed chunk to get around this, but a better approach would check to see if any data remains and yield it at the end.
  • For compressed files, the path has .gz appended to it, so the path you request via the storage system may not exist on the filesystem. Just something to note if you access the files without using the storage system.

Writing this also gave me a few ideas for some small refactorings to make FileSystemStorage more easily subclass-able:

  • Overriding the path method made it easy to append .gz transparently, but I had to write my own listdir since I didn't want to add .gz when resolving a directory path. It might be nice to have a path transformation helper method that FileSystemStorage.path calls.
  • It would be nice to move the directory creation and permission "post-processing" out of FileSystemStorage._save and into helper methods to avoid having to copy/paste the code. Alternatively, there could be another method to actually write the file data that _save could call by default.

Not that any of this is difficult to deal with, but I'd imagine file system storage backends could almost be considered a class of their own when you start thinking about things like compression, sharding, etc. Making it just a bit cleaner to create them would be a nice touch.