We recently implemented a new feature that required two interesting aspects: storing a file in MongoDB using GridFS (think traditional Blob) and then pushing that file down to a browser. Along the way, I discovered a lot of questions on stackoverflow and the like pertaining to various aspects of GridFS and Sinatra pushes, so I thought I’d explain what we did.
First off, GridFS is a specification for storing and retrieving large documents. In essence, Mongo breaks a large document into smaller documents (called chunks) and stores them in one collection and associates the meta data related to the aggregated chunks into another collection. It’s a fairly handy feature and in our case, it made sense to use GridFS rather than S3 (or the underlying file system).
All MongoDB drivers implement a similar access pattern to leverage GridFS and they hide the complexity of working with chunks. For example, the Ruby driver simply provides a single interface to manage the two GridFS collections. In fact, the interface basically provides two methods: get
& put
which might seem somewhat limiting, especially because get
takes an _id
! What’s more, you can add rich meta data to a GridFS document (via the put
call) – data you’d conceivably want to query by, but at first glance, you can’t via the get
method. Nevertheless, because GridFS is ultimately two collections in Mongo, you can query them as you would any other collection.
Thus, finding files by some attribute other than their _id
is as easy as writing the corresponding query:
1
|
|
Keep in mind, however, that the result returned above is a JSON document – i.e. the variable file
above isn’t a pointer to an actual I/O instance but simply a JSON document with its details. To get the actual file, use the get
method provided by the driver’s GridFS facade – this’ll handle the details of grabbing the file from GridFS. The get
method takes an _id
, which you can get from the file
document. Thus, to get an instance of a readable file, you can grab it like so:
1
|
|
Thus, with the get
call you can actually get ahold of a file instance and not some JSON document describing it.
To push this instance down to a browser using Sinatra, you need to do 3 things:
- set the content type
- set the attachment name
- write the file to the response
In our case, the file is an Android app; accordingly, the content type is application/vnd.android.package-archive
and as you’ll note, the attachment is simply the name of the .apk
file. Finally, the response to the request is written to by reading from the corresponding GridFS file:
1 2 3 4 5 6 |
|
As you can see, it’s all quite easy – GridFS is simply a facade for two collections; moreover, forcing a download in Sinatra is three straightforward steps. Can you dig it, man?