Understanding GridFS in MongoDB
For storing and retrieving huge files in MongoDB that are larger than the 16 megabyte (MB) BSON document size limit, GridFS is a specification. GridFS maintains sections of a huge file as independent documents instead than keeping them together. MongoDB can handle almost any file size with this method.
GridFS now uses 255 KB chunks instead of 256KB in versions before 2.4.10. Since GridFS breaks down larger files, chunks cannot exceed the 16MB document size limit.
GridFS arranges files into two default collections, usually fs:
- fs.files: This collection stores filename, chunkSize, uploadDate, md5 checksum, and bytes. Custom metadata fields can be added by applications to this document.
- fs.chunks: This collection contains the file’s actual binary data, divided into its component parts. Data (the binary payload), n (the chunk’s sequence number, starting at 0), files_id (which links it to its parent document in fs.files), and _id (unique to the chunk) are all fields included in each chunk document.
GridFS uses the files_id and n fields in the fs.chunks collection to create a unique compound index for effective retrieval.
MongoDB’s official drivers, which implement the GridFS specification, and the command-line tool mongofiles are two ways to communicate with GridFS.
Advantages of using GridFS include
- Simplifying your application stack by doing away with the requirement for an independent file storage system.
- Using the scalability and automatic failover offered by MongoDB’s built-in replication and auto-sharding features for your file storage.
- Reducing filesystem constraints, like problems with storing a lot of files in one directory.
- Accessing specific parts of a huge file without loading it into memory.
But there are other factors as well:
- In general, performance lags behind direct filesystem access.
- Atomically updating file content is not supported directly; instead, the old file is usually deleted and the updated version is saved again.
- It’s frequently more efficient to store files under 1MB as binary data inside a single document instead of utilising GridFS.
Code Example for GridFS using mongofiles: To upload a file:
$ mongofiles put foo.txt
To list files:
$ mongofiles list
MongoDB’s Atomicity and Transactions
The basic data integrity guarantee of MongoDB is the atomicity of operations on a single document. This guarantees document-level consistency by ensuring that all modifications made to a document are either fully implemented or not. When an update statement tries to change several fields in a single document, for instance, either all of the fields are successfully updated or none are.
MongoDB previously didn’t enable mult-doc atomic transactions. This meant that an operation affecting numerous documents could be interleaved with others, causing inconsistencies if not handled properly at the application level.
Common tactics used to handle situations needing several document modifications in earlier versions were as follows:
- Embedding related data: Consolidating related data into a single document was the suggested method for including all connected material that is regularly updated. This made it possible for complicated logical entities to leverage single-document atomicity.
- Two-Phase Commit: Applications may use a two-phase commit pattern for operations that logically depend on atomicity across several documents or collections. The application controls the transaction’s state and handles possible rollbacks in the event that an operation fails with this programmatic solution. Although it guarantees data consistency, during the procedure, documents may momentarily reflect pending data states.
- findAndModify command: The findAndModify command can be used to change a single document atomically and return it.
Code Example for Single Document Atomic Update using $set:
db.users.update({ age: { $gt: 18 } }, { $set: { status: "A" } }, { multi: true })
This operation atomically sets the status field to “A” for all documents in the users collection where age is greater than 18.
Code Example for Batch Insert:
>db.post.insert([ { title: 'MongoDB Overview', description: 'MongoDB is no sql database', by: 'tutorials point' } // ... more documents ])
Newer MongoDB Versions: Multi-document ACID-compliant transactions were made available across replica sets and even sharded clusters with MongoDB version 4.0. For operations involving numerous documents, collections, and databases, this important development offers the complete guarantees of Atomicity, Consistency, Isolation, and Durability (ACID).
You must have a MongoDB deployment running version 4.2 or later in order to use these transactions, and you must also have your MongoDB drivers updated to work with MongoDB 4.2 or later.
Key aspects of multi-document transactions include:
- ACID Compliance: Verifies that the entire group of operations commits or aborts, preserving data validity.
- APIs: MongoDB has a more detailed Core API for transaction management in addition to a more straightforward Callback API.
- Operational Limitations: Read/write (CRUD) operations on pre-existing collections or databases are the only things you can do during a transaction. Creating, dropping, or indexing collections are examples of operations that are prohibited within a transaction.
- Tuning: Transactions can be set to have limited oplog sizes (each oplog entry must be within 16MB BSON document size) and timing constraints (e.g., maximum runtime, time to acquire locks).
Code Example for a Multi-Document Transaction:
// This example requires MongoDB 4.0+ and compatible driver
// This is illustrative, full error handling and retry logic as per data [106] would be in a real application
const session = client.startSession();
session.startTransaction();
try {
employeesCollection.updateOne(
{ employee: 3 },
{ $set: { status: "Inactive" } },
{ session }
);
eventsCollection.insertOne(
{ employee: 3, status: { new: "Inactive", old: "Active" } },
{ session }
);
session.commitTransaction();
print("Transaction committed successfully.");
} catch (error) {
print("Caught exception during transaction, aborting: " + error);
session.abortTransaction();
throw error;
} finally {
session.endSession();
}
This shows how to combine employeeCollection and eventsCollection actions into one atomic transaction.
In conclusion, GridFS breaks enormous MongoDB files into digestible parts, while MongoDB’s atomic operations, now incorporating multi-document transactions in later versions, assure data consistency and integrity even in complicated circumstances.