Schema Modelling & Relationships in MongoDB
MongoDB, a document-oriented database system, requires practical schema modelling due to its flexible structure. MongoDB collections don’t require a specified document structure like RDBMS. Documents in the same collection can have distinct fields and structures.
MongoDB schema design focusses on data queries, updates, and processing to meet your application’s needs and data access patterns. If objects are used together, combine them into one document; if not (especially if RDBMS joins are needed), separate them; and optimise for the most common use cases. For many versions until v3.2, MongoDB did not offer join operations, however later versions supported joins using the $lookup aggregation operator. This indicates that references or embedded documents which are comparable to table joins in RDBMS are frequently used to handle relationships.
Relationships Using Embedded Documents
Concept: Embedding entails keeping relevant information in a single document structure. By grouping related pieces of information into a single database record, this method creates a “denormalised” data model.
When to Apply:
- “Contains” relationships: When an entity, such as a customer’s address, naturally contains another.
- One-to-many (particularly “one-to-few”) relationships: When the number of “many” parties in the connection is small and the child papers are nearly always analysed in the context of their parent. For instance, a user’s several addresses or comments on a blog article.
- Optimised for reads: Unlike running several queries to resolve references, embedding enables an application to retrieve all linked data in a single query, greatly minimising the number of database operations.
- Atomic updates: All relevant data within the document can be inserted or updated atomically with a single write action. For related fields, this makes consistency management easier.
Drawbacks:
- Document growth: Particularly with the MMAPv1 storage engine, if embedded arrays or fields expand dramatically, the document may take up more space than it has been allotted, resulting in reallocations and possible write performance issues. To counteract this, MongoDB 3.0+ makes use of Power of 2 Sized Allocations.
- Document size limit: No document may exceed 16 MB. The quantity of data that can be incorporated is thus constrained.
- Data repetition: If shared data changes, denormalisation may result in duplicating data across documents, requiring more complicated modifications.
Example: Comments within a Post Document A classic example is embedding comments within a blog post document.
{
_id: ObjectId("7df78ad8902c"),
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100,
comments: [
{
user:'user1',
message: 'My first comment',
dateCreated: new Date(2011,1,20,2,15),
like: 0
},
{
user:'user2',
message: 'My second comments',
dateCreated: new Date(2011,1,25,7,45),
like: 5
}
]
}
All of a post’s comments are included in the post document itself under this format. A single query to the posts collection is sufficient to obtain a post along with its comments. Dot notation (e.g., comments.user) is used to access specific fields within embedded documents (such as user or message in comments).
Relationships Using Manual References
Concept: The idea of manual referencing is to save a document’s _id attribute as a link in another document. This keeps the data model “normalised,” with similar data in distinct collections.
When to Apply:
- Many-to-many relationships: For intricate connections in which several documents can connect to one another. Products that fall under more than one category or categories that contain more than one product are two examples.
- Less frequently accessed data: When the primary document does not always require the linked data.
- Volatile data: Normalising the associated data stops it from being updated continuously across numerous documents if it changes regularly and independently.
- Large “many” side: When a one-to-many relationship’s “many” side is sizable and limitless (for example, hundreds of comments on a highly popular article or followers for a celebrity user). In some situations, embedding might cause documents to become larger than the allotted 16MB.
- Independent access: Accessing and querying associated documents separately from their parent documents is necessary. For example, showing a list of every recent comment made on every post.
Drawbacks:
- Multiple enquiries: The client-side application must make additional queries (several round trips to the server) in order to resolve manual references. This may not be as efficient as using embedding to retrieve all the data in a single query.
- No automatic joins: The application code must manually execute the “joins” because MongoDB typically does not provide server-side join capabilities (although $lookup does exist).
Example: Products and Categories (Many-to-Many) Consider products belonging to multiple categories. Instead of embedding all category details within each product, you can use references: Product Document:
{
_id: ObjectId("5df25d97d85242f436000001"),
title: "Extra Large Wheelbarrow",
description: "Heavy duty wheelbarrow...",
// ... other product fields
primary_category: ObjectId("6a5b1476238d3b4dd5000048"), // One-to-many reference
category_ids: [ // Many-to-many references
ObjectId("6a5b1476238d3b4dd5000048"), // Reference to "Gardening Tools"
ObjectId("6a5b1476238d3b4dd5000049") // Reference to "Lawn Care"
]
}
Category Document:
{
_id: ObjectId("6a5b1476238d3b4dd5000048"),
name: "Gardening Tools",
parent: ObjectId("6a5b1476238d3b4dd5000047"), // Reference to parent category
// Denormalized ancestors array can be added for faster queries on hierarchy
ancestors: ["Home", "Outdoors"]
}
One way to get a product and its categories is to query the products collection, then use the category_ids array to query the categories collection using $in. Category_ids and parent should be indexed for faster lookups.
Relationships Using DBRefs
Concept: Compared to manual _id linking, DBRefs are a more formal approach for representing document references. The collection name ($ref), the referenced document’s _id value ($id), and, if available, the database name ($db) are all included in a DBRef.
When to Apply:
- Heterogeneous references: DBRefs are best for heterogeneous references to documents from diverse collections. A user document may require to link to individual sets of addresses (home, office, mailing) like address_home, address_office, etc.
- Standardised linking: If your database interfaces with several frameworks or tools that understand DBRef, they give a consistent format for document links.
- For working with DBRefs, certain drivers provide auxiliary methods.
When Not to Use (Prefer Manual References)
- Compactness and simplicity: Manual _id references are easier to construct and more compact, making them lighter for developers to work with in the majority of situations when the referred collection is known and consistent.
- Performance: Drivers do not automatically resolve DBRefs into documents; similar to manual references, DBRefs still need client-side applications to send extra queries in order to dereference and get the relevant documents.
Example: Heterogeneous References in a User Document
{
"_id": ObjectId("53402597d852426020000002"),
"name": "Tom Benzamin",
"contact": "987654321",
"dob": "01-01-1991",
"homeAddress": {
"$ref": "address_home",
"$id": ObjectId("52ffc4a5d85242602e000000"),
"$db": "user_data" // Optional: if address_home is in a different database
},
"officeAddress": {
"$ref": "address_office",
"$id": ObjectId("53000000d85242602e000000"),
"$db": "user_data"
}
}
The program queries address_home using the homeAddress DBRef’s $ref and $id variables to get the home address.
In conclusion, understanding your application’s data access patterns, relationship cardinality, and data volatility determines whether to embed, manual reference, or use DBRefs. If data is often co-accessed, embedding prioritises read efficiency and single-document atomicity. For larger, more erratic, or independently accessible related data, manual references work best, particularly in many-to-many situations. For complicated situations involving heterogeneous references from several collections, DBRefs is a specialised solution.