We recently implemented a new feature that required two interesting aspects: storing a file in MongoDB using GridFS (think traditional Blob) and then pushing that file down to a browser. Along the way, I discovered a lot of questions on stackoverflow and the like pertaining to various aspects of GridFS and Sinatra pushes, so I thought I’d explain what we did.
First off, GridFS is a specification for storing and retrieving large documents. In essence, Mongo breaks a large document into smaller documents (called chunks) and stores them in one collection and associates the meta data related to the aggregated chunks into another collection. It’s a fairly handy feature and in our case, it made sense to use GridFS rather than S3 (or the underlying file system).
All MongoDB drivers implement a similar access pattern to leverage GridFS and they hide the complexity of working with chunks. For example, the Ruby driver simply provides a single interface to manage the two GridFS collections. In fact, the interface basically provides two methods:
put which might seem somewhat limiting, especially because
get takes an
_id! What’s more, you can add rich meta data to a GridFS document (via the
put call) – data you’d conceivably want to query by, but at first glance, you can’t via the
get method. Nevertheless, because GridFS is ultimately two collections in Mongo, you can query them as you would any other collection.
Thus, finding files by some attribute other than their
_id is as easy as writing the corresponding query:
Keep in mind, however, that the result returned above is a JSON document – i.e. the variable
file above isn’t a pointer to an actual I/O instance but simply a JSON document with its details. To get the actual file, use the
get method provided by the driver’s GridFS facade – this’ll handle the details of grabbing the file from GridFS. The
get method takes an
_id, which you can get from the
file document. Thus, to get an instance of a readable file, you can grab it like so:
Thus, with the
get call you can actually get ahold of a file instance and not some JSON document describing it.
To push this instance down to a browser using Sinatra, you need to do 3 things:
- set the content type
- set the attachment name
- write the file to the response
In our case, the file is an Android app; accordingly, the content type is
application/vnd.android.package-archive and as you’ll note, the attachment is simply the name of the
.apk file. Finally, the response to the request is written to by reading from the corresponding GridFS file:
1 2 3 4 5 6
As you can see, it’s all quite easy – GridFS is simply a facade for two collections; moreover, forcing a download in Sinatra is three straightforward steps. Can you dig it, man?