Data Types in MongoDB
Binary JSON is used by open-source document database MongoDB. JSON-like documents are serialised in binary for efficiency, traversability, and performance. JSON is similar, but BSON has more data types and is faster to process and find.
Here’s a detailed look at the several BSON data types that MongoDB supports:
- String: Common data type strings in MongoDB must be UTF-8 valid.
- Integer: Based on your server, integers hold numerical values in 32-bit or 64-bit. In MongoDB, numbers default to 64-bit floating-point, thus you must use NumberInt() or NumberLong() to store integers.
- Boolean: True or false values are stored in boolean types.
- Double: A container for floating-point data. This is the type that the shell uses by default for numbers.
- Arrays: This flexible type can hold lists, arrays, or several values in a single key. Because MongoDB “understands” their structure, you may query them, create indexes on their contents, and even change array elements atomically.
- Timestamp: Usually not used for dates in ordinary applications, this unique BSON type is used mostly for internal MongoDB purposes. It is a 64-bit value, where the next 32 bits are an increasing ordinal inside that second, while the initial 32 bits indicate seconds since the Unix epoch. Within a single instance of Mongod, timestamp values are always distinct.
- Object (Embedded Documents): Documents that are embedded utilise this data type. Rich, hierarchical data structures can be represented in a single document using embedded documents. By eliminating the requirement for intricate joins that are typical in relational databases, this can streamline data retrieval. Because MongoDB is aware of their structure, it can use dot notation to enable indexing, querying, and updates within them. An Embedded Document Example:
- Null: This type denotes a field that does not exist or stores a Null value.
- Symbol: Used in the same way as a string, but usually only in languages that employ a particular sort of symbol. The most recent BSON specification deprecates it.
- Date: Stores UNIX time (milliseconds since epoch). To save an actual Date object in the shell, use new Date() (or ISODate()) instead of Date().
- Binary Data (BinData): Arbitrary binary data is stored in the Binary Data (BinData) type. The 16 MB BSON document size limit applies to documents that contain binary data. GridFS, a protocol for storing files of any size by splitting them into chunks and storing metadata independently, is offered by MongoDB for bigger binary objects. A string of random bytes, binary data is unmanageable from the shell. Non-UTF-8 strings can only be saved to the database in this manner.
- Code: JavaScript code is stored directly in the document using this method.
- Regular Expression: To match patterns, regular expressions are stored in this data type.
- Min/Max Keys: When creating chunk boundaries in sharded collections, min/max keys internal BSON types that reflect the lowest and maximum values are very helpful.
ObjectId as the Default Primary Key ()
An _id field is required for each document stored in MongoDB since it acts as the main key. Within a collection, the _id value needs to be distinct. An ObjectId is automatically assigned to this field by MongoDB if you don’t specify one when entering a document. The server always starts a document with the _id field, moving it if it’s not.
By default, _id variables are ObjectId. Due to MongoDB’s distributed nature, it must be lightweight and uniquely generatable across machines, especially in sharded setups where synchronising auto-incrementing primary keys across several servers would be challenging and time-consuming. In line with MongoDB’s tenet of pushing work from the server to the drivers whenever feasible to improve scalability at the application layer, ObjectIds are typically generated by the client-side drivers, even if the mongod instance is capable of generating them.
As long as it is unique within the collection, you are free to supply your own _id value, which can be any BSON data type (apart from an array). If you create your own UUIDs for _id, it is advised to save them as BSON BinData types for efficiency.
Structure of a 12-byte ObjectId
A 12-byte hexadecimal identifier called an ObjectId ensures that each document is unique. There are four main components to its structure:
- First 4 bytes: Current Timestamp: The seconds since the Unix epoch (January 1, 1970) are represented by these bytes. This makes it possible to retrieve the creation timestamp that ObjectIds naturally contain. Sorting documents by creation time is about the same as sorting them by their _id (when using ObjectIds).
- Next 3 bytes: Machine Identifier: The computer where the ObjectId was generated is identified by the following three bytes: computer Identifier.
- Next 2 bytes: Process ID: The process ID of the MongoDB server that generated the ObjectId is represented by the next two bytes: Process ID.
- Remaining 3 bytes: Simple Incremental Counter: To further ensure uniqueness, this random counter value increases each time an ObjectId is generated within the same process and second. To guarantee an increasing order of values, the counter and timestamp fields are also saved in Big Endian format.
If the majority of write operations include newly produced ObjectIds, then utilising them as a shard key which disperses data across different servers can occasionally result in “hotspots” even though they offer a strong unique identifier. This is due to the fact that the timestamp component causes them to increase monotonically, which limits insert throughput by making new documents originally belong to the same chunk on a single shard. Furthermore, when numerous systems generate values in the same second, clock skew between client systems might lead to non-strict ordering, even though ObjectIds are intended for worldwide uniqueness.
Code Examples
Sample Document Insertion: When you insert a document without specifying the _id field, MongoDB automatically generates an ObjectId for it.
// Inserting a document without specifying _id
db.myCollection.insertOne({
title: 'MongoDB Explained',
description: 'A deep dive into data types',
author: 'Expert AI'
});
// The inserted document will look something like this in the database:
// (Note: The ObjectId value will be unique and different in your environment)
{
"_id": ObjectId("65c5d33f7c35f92a34b60a2d"), // MongoDB auto-generated this ObjectId
"title": "MongoDB Explained",
"description": "A deep dive into data types",
"author": "Expert AI"
}
Explicitly Providing an ObjectId (or custom _id): You can provide your own _id.
// You can generate a new ObjectId manually in the shell:
newObjectId = ObjectId();
print(newObjectId); // Example output: ObjectId("5349b4ddd2781d08c09890f3")
// Or provide a specific 12-byte hexadecimal string as ObjectId:
myObjectId = ObjectId("507f191e810c19729de860ea"); // An example specific ID
db.myCollection.insertOne({
_id: myObjectId,
title: 'Custom ID Document',
message: 'This document has a manually assigned ObjectId.'
});
Extracting Information from an ObjectId:
// Assume 'doc' is a document retrieved from the collection
doc = db.myCollection.findOne({ title: 'Custom ID Document' });
// Get the hexadecimal string representation:
print(doc._id.str); // Returns: "507f191e810c19729de860ea"
print(doc._id.valueOf()); // Also returns: "507f191e810c19729de860ea"
// Get the timestamp (creation time) of the ObjectId:
print(doc._id.getTimestamp()); // Returns: ISODate("2012-10-17T20:46:22Z")
You can also use ObjectId() to perform range queries based on the creation time, as the timestamp is part of the ObjectId structure.