MongoDB Atlas
- database as a service
- don't need to manage mongodb yourself
Replica Set
- stored on multiple servers
- redundant
- high availability
Register CLI w/ an account
atlas auth register
Create MongoDB Atlas cluster in Atlas project MDB_EDU
atlas setup --clusterName myAtlasClusterEDU --provider AWS --currentIp --skipSampleData --username myAtlasDBUser --password myatlas-001 | tee atlas_cluster_details.txt
# press enter or Y
Load sample data
atlas clusters loadSampleData myAtlasClusterEDU
MongoDB Clusters: replica set or shared cluster
- Replication of a group of MongoDB servers
- Holds copies of the same data
- Production deployments: high availability & redundancy
How they work:
- Primary node: receives all write operations
- All changes are recorded in the operation log
oplog
- Secondary nodes replicate the primary node's operation log & ensures the data sets are exactly the same as the primary node
- Replication is asynchronous
MongoDB is:
- General Purpose Document Database
- Structues data in documents, similar to JSON objects
- Relational databases store data in tables: rows & columns
Document Model:
- Easier/faster to plan how application data corresponds to data in db
- Model data of any shape/structure
- Key-value pairs
- Text
- Geospatial indexes
- Time-series
- Graph Data
- 1 Format: Model & Query Data
Use cases for MongoDB:
- E-commerce
- Content management
- IoT & time-series data
- Trading & payments
- Gaming
- Mobile apps
- Real-time data & AI
Pros of MongoDB:
- Scalability
- Resilience
- Speed of development
- Privacy/security
Structure:
- Document: Basic unit of data in MongoDB
- Collection: Collection of document
- Database: Container for collections
Database > Collections > Documents
MongoDB => Core of Atlas
The document structure is:
- Displayed in JSON
- Stored in BSON
- extension of JSON, with additional features
- adds support to additional data types
- json: string, object, array, null, etc.
- bson: dates, object id's, number, more
Every document requires an ObjectID, a unique identifier
- Format:
_id
- Acts as a primary key
- If a document is missing an
_id
, it is automatically generated
{
"key": "value",
"key": "value",
"key": "value",
},
{
"_id": 1,
"name": "AC3 Phone",
"colors" : ["black", "silver"],
"price" : 200,
"available" : true
}
Documents can contain different fields.
Fields can contain different data types.
- Unlike relational databases
- Relational databases require a table schema, before inserting data
MongoDB has a flexible schema
- Iterate quickly & evolve
Flexible schema example:
<!-- original data structure -->
{
"_id": ObjectID("20832089348023480"),
"name": "chair",
"price": 25.00
}
<!-- later, we add description -->
{
"_id": ObjectID("20832089348023480"),
"name": "chair",
"price": 25.00,
"description": "classic wooden chair"
}
With a relational database, making these changes would:
- run into complex chain of dependencies
- schema change scripts
- coordinate across engineering teams
- downtime when script executes
With MongoDB, schema changes can be made by:
- Update our classes to include the new field
- Start inserting new documents w/ new schema
- To add constraints to the structure of the data:
- Add optional schema validation
Field names:
_id
reserved, primary key- fields cannot contain
null
- field names can contain dots
.
and$
dollar signs- added in MongoDB 5.0
Document limitations:
- BSON = 16 mb
BSON field order is important!
{a: 1, b: 1}
is EQUAL to{a: 1, b: 1}
{a: 1, b: 1}
is NOT EQUAL to{b: 1, a: 1}
- query engine may reorder fields during processing
Common practices:
- Use
ObjectId
- Use natural unique id if possible: saves space & avoids an additional index
- Generate auto-incrementing number
- Generate UUID
Query filter documents:
- specify conditions that determine which records to select
- for read, update & delete
- use
<field>:<value>
expressions to specify equality conditions or query operator expressions
{
<field1>: <value1>,
<field2>: { <operator>: <value> },
...
}
- Update Specification Documents
- Index Specification Documents
BSON is:
- written in binary
- slow to read / fast to build/scan
- data larger than JSON data
- are encoded before storing / decoded before displaying
- not human readable: needs to be parased (machine-generated)
- used by databases to store data
Advantages:
- lightweight & traversable
- bson's binary structure encodes
type
&length
- much faster to traverse compared to json
- has more data types compared to json
$type
operator: query fields by their BSON type
- supports
number
alias for integer, decimal, double, long
Indexes:
- support efficient execution of queries
- should be considered for fields which your app reads often
- data accessed together should be stored together
- documents in the same collection can contain different structures
- polymorphism
- embedded document model
- enables us to build complex relationships among data
- can normalize data by using database references
- how your app uses data > how it's stored
- opposite of relational database
- store, query & use resources optimally (for performance)
Relationship types:
- One-to-one
- data entity in one set connect to exactly on data entry in another
- in relational dbs: directors table & movies table w/ a join
- One-to-many
- data entity in one set is connected to any number of data entities in another set
- movie with a cast of many actors
- nested array
- Many-to-many
- any number of data entities in one set are connected to any number of data entities in another set
Ways to model relationships:
- Embedding
- Related data is inserted into your document
- Referencing
- Refer to documents in another collection
Structure data to match the way your app queries/updates it
- data accessed together => stored together
Embedding or nested documents:
- 1-to-many
- many-to-many
{
"name": {
"first": "bryan",
"last": "miller"
}
}
Warning
- Embedding data in a single document can create LARGE documents
- Excessive memory & add latency for reads
- Large docs must be read in FULL: slow performance
- Continuously adding data without limit created unbounded documents
- may exceed the max BSON document size: 16mb
- unbounded: schema antipatterns
References
- Relate 2 or more separate collections to one another
- Save
_id
field of one document in another document as a link between the two - simple & sufficient for most use cases
- called "linking" or "data normalization"
Pros:
- no duplication of data
- smaller documents
Cons:
- querying from multiple documents costs extra resources & impacts read performance
Embedding | Referencing |
---|---|
Single query to retrieve data | No duplication |
Single operation to update/delete data | Smaller documents |
Data duplication | Need to join data from multiple documents |
Large documents |
Optimum Efficiency:
- query result times
- memory usage
- cpu usage
- storage
Avoid unbounded documents, or documents that grow indefinitely
- cause: embedding
Comments array problem as it grows larger:
- will take up more space in memory
- may impact write performance
- as new comments are added, the entire document is rewritten into storage
- difficult to perform pagination
- comments can't be filtered for a single post
- all comments will have to be read
- maximum document size of 16 mb (storage problems)
Solution:
- create 2 collections:
- blog_post
- comments
- reference
blog_entry_id
in each comment
Schema design patterns: guidelines help devs plan, organize & model data
If best practices for schemas are not followed (Schema anti-pattern results):
- sub-optimal performance
- non-scalable solutions
Examples of schema anti-patterns:
- massive arrays
- mass number of collections
- bloated documents
- unnecessary indexes
- queries without indexes
- data accessed together, but stored in separate collections
Data Explorer in Atlas:
- Collections > Indexes: shows indexes & stats
- Order by usage, drop unused indexes
- Collections > Schema Anti-patterns:
- Gives you details on how to fix issues
Performance Advisor in Atlas:
- tells which indexes are redundant
- gives us suggestions
Connection Strings allows us to connect to our cluster.
- host
- options for connecting
- shell
- compass
- applications
Two formats:
- standard format
- standalone clusters, replica sets, sharded clusets
- dns seedless format
- provide dns server list to our connection string
- more flexibility to deployment
- ability to change servers in rotation without reconfiguring our clients
Locating our connections string:
- database menu > connect button > drivers
const connectionString =
"mongodb+srv://myAtlasDBUser:<password>@cluster.usqsf.mongodb.net/?retryWrites=true&w=majority";
const format = `mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]`;
mongodb
= mongodb connection string (seedless dns entry)+srv
= sets tls security to true & use dns seedlyusername:password
@cluster.usqsf.mongodb.net
host & optional port (default: 27017)?retryWrites=true&w=majority";
additional options- auto retry on write fails
- connection timeout
- tls / ssl
- connection pooling
- etc.
: / ? # [ ] @
characters must be converted using percent encoding.
Install mongosh
wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list
sudo apt-get update
sudo apt-get install -y mongodb-mongosh
Connect to the shell by using the connectionString.
- Create user/password first in Atlas under Security > Database Access
# get connection string
atlas clusters connectionStrings describe myAtlasClusterEDU
# get connection string & store in env variable
MY_ATLAS_CONNECTION_STRING=$(atlas clusters connectionStrings describe myAtlasClusterEDU | sed "1 d")
mongosh -u myAtlasDBUser -p myatlas-001 $MY_ATLAS_CONNECTION_STRING
# output a document describing the role of mongod instance using mongosh
db.hello()
MongoDB Compass: GUI connection
- Download/install
- Copy connection string via site
Features:
- Aggregations: can be exported to any language
- Schema: analyze structure & optimize schema
- Explain Plan: understand performance of queries
- Indexes: indexes that exist on a collection, understand performance
- Validation: enforce structure of docs on update/insert
MongoDB Drivers:
- Allow you to connect to MongoDB clusters in your language of choice
- NodeJS Drivers Documentation
- Developer Center: Code Examples
- User authentication errors
MongoServerError: bad auth : Authentication failed.
- Fill in password to connection string
- Network errors
MongoServerSelectionError: connection <monitor> to 34.239.188.169:27017 closed
- Add IP address to security > network access in Atlas
MongoDB auto creates collections if missing.
insertOne()
- example:
db.<collection>.insertOne()
- example:
insertMany()
- example:
db.<collection>.insertMany()
- example:
db.grades.insertOne([{student_id: ...}]);
db.grades.insertMany([
{student_id: ...},
{student_id: ...},
{student_id: ...}
]);
Output:
{
"acknowledged": true,
"insertedId": ObjectId("...")
}
find
method: db.collection.find
# select a collection
use training
# find zip codes in the us
db.zips.find();
# iterate through results
it
Retrieve a specific document
# documents containing a specific field
{ field: { $eq: <value> } }
{ field: <value> }
# example: docs w/ the state AZ
db.zips.find({ state: "AZ" })
$in
operator: select all documents that have a field value equal to any of the values specified in an array
# format
db.<collection>.find({
<field>: { $in:
[<value>, <value>, ...],
}
})
# example: find all documents w/ one of the following cities
db.zips.find({ city: { $in: ["PHOENIX", "CHICAGO"] } })
# more examples:
db.zips.find({ _id: ObjectId("5c8eccc1caa187d17ca6ed16") })
db.sales.find({ _id: ObjectId("5bd761dcae323e45a93ccff4") })
db.sales.find({ storeLocation: { $in: ["London","New York"] } })
$gt
greater than$lt
less than$lte
less than or equal to$gte
greater than or equal to
Dot notation is used to reference object values
Examples:
db.sales.find({ "items.price": { $gt: 50}})
db.sales.find({ "items.price": { $lt: 50}})
db.sales.find({ "customer.age": { $lte: 65}})
db.sales.find({ "customer.age": { $gte: 65}})
Get a single document in collectin: findOne()
Search for every document that contains a value (single value OR array)
# returns items with an array that contains the investmentStock
db.accounts.find( { "products": "investmentStock" } )
Find all documents that contain a value only when inside an array: $elemMatch
Use the
$elemMatch
operator to find all documents that contain the specified subdocument
db.accounts.find({ "products": { $elemMatch: { $eq: "investmentStock" } } } )
db.sales.find({
items: {
$elemMatch: { name: "laptop", price: { $gt: 800 }, quantity: { $gte: 1 } },
},
})
db.transactions.find({
transactions: {
$elemMatch: { amount: { $lte: 4500 }, transaction_code: "sell" },
},
})
$and
: all criteria must match- implicit
$and
:db.collection.find({ <exp1>, <exp2>})
- implicit
$or
: one criteria must match
Examples:
# implicit and
db.routes.find({ "airline.name": "Southwest Airlines", stops: { $gte: 1 } })
# or
db.routes.find({
$or: [{ dst_airport: "SEA" }, { src_airport: "SEA" }],
})
# and & or operators
db.routes.find({
$and: [
{ $or: [{ dst_airport: "SEA" }, { src_airport: "SEA" }] },
{ $or: [{ "airline.name": "American Airlines" }, { airplane: 320 }] },
]
})
# example
db.sales.find({
couponUsed: true,
purchaseMethod: "Online",
"customer.age": { $lte: 25 }
})
db.sales.find({
$or: [{ "items.name": "pens" }, { "items.tags": "writing" }],
})
replaceOne
: replace a single document
- retain the same Id
db.collection.replaceOne(filter, replacement, options)
- filter: A query that matches the document to replace: ObjectId, unique, good option
- replacement: new BSON object containing updated data, replaces all fields
- options: An object that specifies options for the update.
db.books.replaceOne(
{
_id: ObjectId("6282afeb441a74a98dbbec4e"),
},
{
title: "Data Science Fundamentals for Python and MongoDB",
isbn: "1484235967",
publishedDate: new Date("2018-5-10"),
thumbnailUrl:
"https://m.media-amazon.com/images/I/71opmUBc2wL._AC_UY218_.jpg",
authors: ["David Paper"],
categories: ["Data Science"],
}
)
# lab example
db.birds.findOne({ common_name: "Northern Cardinal" })
db.birds.replaceOne(
{ _id: ObjectId("6286809e2f3fa87b7d86dccd") },
{
common_name: "Morning Dove",
scientific_name: "Zenaida macroura",
wingspan_cm: 37.23,
habitat: ["urban areas", "farms", "grassland"],
diet: ["seeds"],
}
)
updateOne
method: updates a single document - IF it exists
$set
operator: adds new fields/values OR replaces existing ones$push
operator: appends element to array OR adds array w/ value as an elementupsert: true
option: update OR insert a document - insert a document w/ provided info IF it doesn't exist
# set: replaces field values
db.podcasts.updateOne(
{ _id: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8") },
{ $set: { subscribers: 98562 }, }
)
# upsert: creates a new doc if no documents match
db.podcasts.updateOne(
{ title: "The Developer Hub" },
{ $set: { topics: ["databases", "MongoDB"] } },
{ upsert: true }
)
# push adds elements to arrays
db.podcasts.updateOne(
{ _id: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8") },
{ $push: { hosts: "Nic Raboy" } }
)
# example: add new array with values
db.birds.updateOne(
{
common_name: "Canada Goose",
},
{
$set: {
tags: ["geese", "herbivore", "migration"],
},
}
)
# increment sightings +1, set last_updated to a date & create document if it doesn't exist
db.birds.updateOne(
{
common_name: "Robin Redbreast",
},
{
$inc: {
"sightings": 1,
},
$set: {
last_updated: new Date(),
},
},
{
upsert: true,
}
)
findAndModify()
: Returns document that has just been updated
updateOne()
increment downloads fieldfindOne()
return document using _id- problem:
- makes 2 round trips to the server
- another user could update the same document between operations
db.podcasts.findAndModify({
query: document to update,
update: changes to be made,
new: true # true: return updated doc, false: return original doc
});
# example
db.podcasts.findAndModify({
query: { _id: ObjectId("6261a92dfee1ff300dc80bf1") },
update: { $inc: { subscribers: 1 } },
new: true,
})
updateMany()
method: updates all documents that match a filtered criteria
- Not an all-or-nothing operation
- If update fails, the successful updates stick while failed documents will not
- Requires running
updateMany()
again to update the remaining documents
- Lacks isolation
- Updates will be visible as soon as they're performed
- Not appropriate for some use cases
- Business transactions
db.books.updateMany(
{ publishedDate: { $lt: new Date("2019-01-01") } },
{ $set: { status: "LEGACY" } }
)
# lab question
db.birds.updateMany(
{
common_name: {
$in: ["Blue Jay", "Grackle"],
},
},
{
$set: {
last_seen: ISODate("2022-01-01"),
},
}
)
deleteOne()
deleteMany()
- can be used to delete filtered items OR all docs in a collection
# delete a single item
db.podcasts.find( { uploaded: ISODate("2020-01-01")});
db.podcasts.deleteOne({ _id: Objectid("6282c9862acb966e76bbf20a") })
db.birds.deleteOne({ _id: ObjectId("62cddf53c1d62bc45439bebf") })
# delete all docs that match a filter
db.podcasts.deleteMany({category: “crime”})
db.birds.deleteMany({ sightings_count: { $lte: 10 } })
# output
{ acknowledged: true, deletedCount: 1 }
Cursors: pointer to result set of a query
find()
returns a cursor, points to the documents that match
Cursor Methods:
- chained to queries
- Perform actions on the result set
sort()
orlimit()
sort()
:
sort({ field: 1 })
= ascending ordersort({ field: -1 })
= descending order- capital letters first, lowercase letter second
- modifiable with options
db.collection.find(<query>).sort(<sort>)
# 1 => ascending, -1 => descending | sorting for a property
db.companies.find({ category_code: "music" },{name: 1}).sort({ name: 1 });
# ^ PROJECTION: only show name fields
# to ensure consistent sort order, add _id field
db.companies.find({ category_code: "music" }).sort({ name: 1, _id: 1 });
# sort by sales date
db.sales.find({}).sort({ saleDate: 1 })
# online purchases w/ coupon used, sorted from newest to oldest
db.sales.find({ purchaseMethod: "Online", couponUsed: true}).sort({ saleDate: -1 })
limit()
:
- limiting results count can enhance performance
db.companies.find(<query>).limit(<number>)
# get top 3 companies with most number of employees
db.companies
.find({ category_code: "music" })
.sort({ number_of_employees: -1, _id: 1 })
.limit(3);
# London stores, items containing laptop OR backpack OR printer paper, sorted new > old, last 3
db.sales.find({ "items.name": { $in: ["laptop", "backpack", "printer paper"] }, "storeLocation": "London", }).sort({ saleDate: -1, }).limit(3)
Projections: Limit what fields you get back from a query
- improves performance and reduces bandwidth
- 2nd argument in a
find()
query:field: 1
to INCLUDE the fieldfield: 0
to EXCLUDE the field
- can't combine include & exclude projections EXCEPT for the
_id
field
# syntax
db.collection.find( <query>, <projection> )
# include field
db.collection.find( <query>, { <field> : 1 })
# Return all restaurant inspections - business name, result, and _id fields only
db.inspections.find(
{ sector: "Restaurant - 818" },
{ business_name: 1, result: 1 }
)
# exclude field
db.collection.find(query, { <field> : 0, <field>: 0 })
# Return all inspections with result of "Pass" or "Warning" - exclude date and zip code
db.inspections.find(
{ result: { $in: ["Pass", "Warning"] } },
{ date: 0, "address.zip": 0 }
)
Get total result count for a query
# syntax
db.collection.countDocuments( <query>, <options> )
# examples
# Count number of docs in trip collection
db.trips.countDocuments({})
# Count number of trips over 120 minutes by subscribers
db.trips.countDocuments({ tripduration: { $gt: 120 }, usertype: "Subscriber" })
db.sales.countDocuments({ items: { $elemMatch: { name: "laptop", price: { $lt: 600 } } } } )
Aggregation: collection and summary of data
Stage: built-in method > performed on data & doesn't permanently modify it
Aggregation Pipeline: series of stages completed one at a time, in order
Aggregations can:
- filter for relevant data
- group documents
- calculate total values from fields across many documents
Data can be:
- Filtered
- Sorted
- Grouped
- Transformed
$match
: filters data that matches criteria
- expects the same syntax as
.find()
method
$group
: groups documents based on criteria
$sort
: puts documents in a specific order
name: { $concat: ["$first_name" , "$last_name] }
: combine two fields
db.collection.aggregate([ <stage>, <stage>, <stage> ])
# aggregation pipeline:
db.collection.aggregate([
{
$stage1: {
{ expression1 },
{ expression2 }...
},
$stage2: {
{ expression1 }...
}
}
])
# example
db.collection.aggregate([ # method
{
$match: { # stage
{ size: "small" } # expression
}
}
])
Match:
- Filters for documents for matching conditions
- Passes those documents to the next stage in the pipeline
- Accepts one argument:
- Place it as early as possible, reducing processing power used by the rest of the pipeline
{
$match: {
"field_name": "value"
}
}
# all zips from CA
db.zips.aggregate([
$match: {
state: "CA"
}
])
Group:
- Groups documents by a group key
- Output: 1 document for each unique value of the group key
- 2 Arguments:
_id
,<field>: <accumulator> : <expression>
{
$group:
{
_id: <expression>, // Group key
<field>: { <accumulator> : <expression> }
}
}
{
$group: {
{
_id: "$city", # group key
totalZips: { $count: { }}
}
}
}
$count
: adds up instances of a field
db.zips.aggregate([
{
$match: {
state: "CA"
}
},
{
$group: {
_id: "$city",
totalZips: { $count : { } }
}
}
])
# find location with the most Eastern Bluebird sightings
db.sightings.aggregate([
{
$match: {
species_common: 'Eastern Bluebird'
}
}, {
$group: {
_id: '$location.coordinates',
number_of_sightings: {
$count: {}
}
}
}
])
$sort
: sorts all input documents & passes them through pipeline in sorted order
- numeric, strings, dates, timestamps
1
: ascending-1
: descending
$limit
: limits # of documents that are passed on to the next aggregation stage
- positive integer: how many documents
{
$sort: {
"field_name": 1
},
{
$limit: 5
}
}
# sort by largest to smallest population, limit to 3 total
db.zips.aggregate([
{
$sort: {
pop: -1
}
},
{
$limit: 3
}
])
# lab
{
...
location: { type: 'Point', coordinates: [ 40, -73 ] }
...
}
db.sightings.aggregate([
{
$sort: {
'location.coordinates.1': -1
}
}, {
$limit: 4
}
])
To access an array index, use .index
or .1
.
$project
stage: determines output shape
- existing or new fields
- projection similar to find() operations
- should be the last stage to format output
1
: included0
: not included
{
$project: {
# include fields:
state:1,
zip:1,
# set population field to $pop's values
population:"$pop",
# omit id field
_id:0
}
}
db.sightings.aggregate([
{
$project: {
_id: 0,
species_common: 1,
date: 1
}
}
])
$set
: adds or modifies fields in the pipeline
- useful when we want to change existing fields in pipeline or add new ones to be used in upcoming pipeline stages
{
$set: {
pop_2022: {
$round: {
$multiply: [ 1.0031, "$pop" ]
}
}
}
}
{
$set: {
place: {
$concat:["$city",",","$state"]
},
pop:10000
}
}
$count
: creates a new document & assigns a field name with the total # of documents at the current pipeline stage
$count: <fieldname>
{
$count: "total_zips"
}
Lab example:
db.sightings.aggregate([
{
$match: {
date: {
$gt: ISODate('2022-01-01T00:00:00.000Z'),
$lt: ISODate('2023-01-01T00:00:00.000Z')
},
species_common: 'Eastern Bluebird'
}
}, {
$count: 'bluebird_sightings_2022'
}
])
# outputs:
[ { bluebird_sightings_2022: 5 } ]
$out
: writes the documents that are returned by an aggregation pipeline into a collection
- must be the last stage
- creates a new collection if it doesn't exist
- overwrites & replaces existing collection with new data
$out: {
db: "<db>",
coll: "<newcollection>"
}
# uses the same db as the pipeline
$out: <newcollection>
# lab
db.sightings.aggregate([
{
$match: {
date: {
$gte: ISODate('2022-01-01T00:00:00.0Z'),
$lt: ISODate('2023-01-01T00:00:00.0Z')
}
}
},
{
$out: 'sightings_2022'
}
])
db.sightings_2022.findOne()
Indexes:
- Special data structures
- Store small portion of the data
- Ordered & easy to search efficiently
- Improve query performance
- Speed up query
- Reduce disk I/O
- Reduce resources required
- Support equality matches & range-based operations & return sorted results
Without Indexes:
- MongoDB reads all documents in a collection scan
- Sorts results in memory
With Indexes:
- MongoDB only fetches the documents id'ed by the index based on the query
- Returns results faster
By default:
- 1 Index per collection:
_id
field - Every query should use an index
Index Costs:
- If we insert/update documents, we need to update the index data structure
- Write performance can degrade if we have too many indexes in a collection
- No useless indexes: delete if not used
Common index types:
- Single field
- Compound indexes: more than 1 field
- Multi-key indexes operate on an array field
db.customers.find([active: true])
db.customers.find({active: true, accounts: 276528})
Indexes summary:
- Ordered data structures for fast data retrieval
- Store small amounts of data
- Reduce computation
- Support equality & range based queries
- Return sorted results
- Write performance cost
- When updated or inserted, the indexes must be updated
- Common types:
- Single Field
- Compound
- Multikey indexes operate on an array field
- Support queries & sort on a single field
createIndex({fieldname: 1})
: field name and sort ordergetIndexes()
: retrieve indexes on a collection- To view indexes on Atlas, go to collections > indexes tab
- To find out what indexes are being used with a query, add
explain()
to the query- ie:
db.customers.explain().find();
- ie:
Create index in a collection: field & sort order:
db.customers.createIndex({
birthdate: 1
})
Create a unique single index field. This prevents future inserts & updates from using a duplicate value.
db.customers.createIndex({
email: 1
},
{
unique:true
})
Check if an index is being used on a query with explain()
:
IXSCAN
= query is using an index & what index is being selectedCOLLSCAN
= collection scan is performed, without using any indexes.FETCH
= documents are being read from the collectionSORT
= documents are being sorted in memory
db.customers.explain().find({
birthdate: {
$gt:ISODate("1995-08-01")
}
})
db.customers.explain().find({
birthdate: {
$gt:ISODate("1995-08-01")
}
}).sort({
email:1
})
Lab:
db.collection.getIndexes();
# create a unique index
db.accounts.createIndex({ account_id: 1 }, { unique: true });
# determine if field is indexed
db.accounts.explain().find({ account_id: "MDB829000996" })
Multikey indexes work with array fields in an index
- Any index where one of the indexed fields is an array
- Array can hold nested objects or other field types
- In compound index, only one field can be an array per index
- Performs a 'fetch' the document after IXSCAN because each index value is stored separately
Create a single field multikey index:
db.customers.createIndex({
accounts: 1
})
Compound indexes are:
- index on multiple fields
- can be a multikey index if it includes an array
- maximum one array field per index
- support queries that match on the prefix of the index fields
In compound indexes:
- Order of the fields matters
- Follow this order: Equality > Sort > Range
- Sort order of field values in the index matters
Equality:
- tests exact matches on a single field
- should be first in a compound index
- reduces query processing time
- retrieves fewer documents
Sort:
- determines order of results
- index sort eliminates need for in memory sorts
- sort order = important if query results are sorted by more than 1 fields and they mix sort orders
You can create indexes that support a query or cover the query:
- Support: bare minimum fields to acquire the document
- Cover: all fields that are needed from the final query
Create a compound index:
db.customers.createIndex({
active:1,
birthdate:-1,
name:1
})
Example:
# original query:
db.customers.find({
birthdate: {
$gte:ISODate("1977-01-01")
},
active:true
}).sort({
birthdate:-1,
name:1
})
# group active customers first (equality), then sort
db.customers.createIndex({
active:1,
birthdate:-1,
name:1
})
Covering a query:
- Index covers a query when MongoDB doesn't need to fetch data from memory since all data is returned by the index
- Most cases: use projections to return only required fields
- Make sure fields in projections are in the index
db.customers.explain().find({
birthdate: {
$gte:ISODate("1977-01-01")
},
active:true
},
# projection: fields covered by the index above (eliminating the need for fetch)
{ name:1,
birthdate:1,
_id:0
}).sort({
birthdate:-1,
name:1
})
# output
...
winningPlan: {
stage: 'PROJECTION COVERED',
...
inputStage: {
stage: 'IXSCAN'
...
}
}
Indexes:
- Have a write cost
- Too many indexes in a collection affect system performance
- Delete unused or redundant indexes
Before deleting an index:
- Make sure it's not being used
- Deleting an index that's the only index supporting a query will affect performance of the query
- Recreating an index = extra time & resources
- Hide the index before deleting it
db.collection.hideIndex(<index_name or index_key>)
- Does not use hidden indexes in queries BUT keeps their keys up to date
- Good for testing the impact of deleting an index
- Delete an index with
db.collection.dropIndex(<index_name or index_key>)
find({username: "abc123"});
find({username: "abc123", active: true})
indexes:
username_1 # redundant, covered by index below
username_1_active_1
# delete index
db.customers.dropIndex(
'active_1_birthdate_-1_name_1'
)
# or
db.customers.dropIndex({
active:1,
birthdate:-1,
name:1
})
# delete all indexes:
db.customers.dropIndexes();
# drop list of indexes
db.collection.dropIndexes([
'index1name', 'index2name', 'index3name'
])
Lab:
db.accounts.dropIndex("account_holder_1")
Relevance based search
- surface records based on a search term
- not a db search for a particular record
- open source search algorithm based on: apache lucene
- not a db index
Search indexes:
- specify how records are referenced for relevance based search
Search Index with Dynamic Mapping:
- All fields are indexed (default)
- except: booleans, objectIds, timestamps
Database > Search > Create Search
- Select the correct db/collection
- Specify options
- Create search
Searches:
- Score: how prevalent the search query is
- Option: assign weights to specific fields
Create a search index with Atlas CLI:
atlas auth login
- Open
search_index.json
- Edit
dynamic
totrue
- Save
- Edit
- In the CLI, run the following command to create the search indexes
atlas clusters search indexes create --clusterName myAtlasClusterEDU -f /app/search_index.json
- To verify the index creation has started:
atlas clusters search indexes list --clusterName myAtlasClusterEDU --db sample_supplies --collection sales
Using Atlas Search Index w/ Dynamic Mapping:
mongosh -u myAtlasDBUser -p myatlas-001 $MY_ATLAS_CONNECTION_STRING/sample_supplies
db.sales.aggregate([
{
$search: {
index: 'sample_supplies-sales-dynamic',
text: {
query: 'notepad', path: { 'wildcard': '*' }
} } },
{
$set: {
score: { $meta: "searchScore" }
}
}
])
Fields being queried are always the same
- Turn off dynamic mapping
- Field mappings:
- Add field
- Select
common_name
- Add data type
- Select string
- Save changes
- Create search index
To create a static index search in Atlas CLI:
atlas auth login
- Open
/app/search_index.json
- Edit the file (add
storeLocation
in fields asstring
)
{
"name": "sample_supplies-sales-static",
"searchAnalyzer": "lucene.standard",
"analyzer": "lucene.standard",
"collectionName": "sales",
"database": "sample_supplies",
"mappings": {
"dynamic": false,
"fields": {
"storeLocation": {
"type": "string"
}
}
}
}
atlas clusters search indexes create --clusterName myAtlasClusterEDU -f /app/search_index.json
- Verify:
atlas clusters search indexes list --clusterName myAtlasClusterEDU --db sample_supplies --collection sales
Using Atlas Search Index with Static Mapping:
- Connect with mongosh:
mongosh -u myAtlasDBUser -p myatlas-001 $MY_ATLAS_CONNECTION_STRING/sample_supplies
db.sales.aggregate([
{
$search: {
index: 'sample_supplies-sales-static',
text: {
query: 'London', path: { 'wildcard': '*' }
} } },
{
$set: {
score: { $meta: "searchScore" }
}
}
]);
compound
: specifies weight of each field in a search
must
: requiredmustNot
: inverse of requiredshould
: gives weight to results that meet the criteria so they appear firstfilter
: remove results that do not meet the criteria
Weight results are affected by: must
, mustNot
& should
Atlas Web:
- Navigate to a collection > Aggregation Tab
- Dropdown > Search
- Delete boiler plate
- Enter:
$search {
"compound": {
"must": [{
"text": {
"query": "field",
"path": "habitat"
}
}],
"should": [{
"range": {
"gte": 45,
"path": "wingspan_cm",
"score": {"constant": {"value": 5}}
}
}]
}
}
Lab:
# connect to Atlas Cluster
mongosh -u myAtlasDBUser -p myatlas-001 $MY_ATLAS_CONNECTION_STRING/sample_supplies
# aggregation pipeline w/ 2 stages
db.sales.aggregate([
# search using a specified index
{
$search: {
index: 'sample_supplies-sales-dynamic',
"compound": {
# filter out sales that were NOT online
"filter": [
{
"text": {
"query": "Online",
"path": "purchaseMethod"
}
}
],
# search online sales that contain a particular item in the order: notepad
# should indicates that the search term appears in the items.name field
"should": [
{
"text": {
"query": "notepad",
# item name has the highest weight of all fields in the record
"path": "items.name",
# constant score overwrites the calculated score
# ^ desireable when you only care about specific matches
"score": { "constant": { "value": 5 } }
}
}
]
}
}
},
# finally, project the fields we're interested in
{
$project: {
"items.name": 1,
"purchaseMethod": 1,
"score": { $meta: "searchScore" }
}
}
])
Facets are buckets that we group search results into.
- Make it easier for users to find search results by grouping them into categories
- Facebook: People, groups, events, etc.
Steps:
- Update field mappings to include facet fields
- Use $searchMeta to categorize results
Facet data types:
- Numbers
- Dates
- Strings
Create date-based facet in Atlas Web:
- Turn off dynamic mapping
- Add field mappings
- Add date from drop down menu
- Add data type
- Select DateFacet from drop down
$searchMeta
:
- Allows us to see facets & how many results are in each bucket
- Buckets are part of the search metadata (not part of the search results themselves)
- Information on how the search was carried out
- Metadata types
- Facets
- Count (number of results in each facet)
To add searchMeta in Atlas
- Aggregation tab
- Select
$searchMeta
in drop down - Specify facets in the text field
$searchMeta is an aggregation stage for Atlas Search where the metadata related to the search is shown. This means that if our search results are broken into buckets, using $facet, we can see that in the $searchMeta stage, because those buckets are information about how the search results are formatted.
$searchMeta: {
"facet": {
"operator": {
"text": {
"query": ["Northern Cardinal"],
"path": "common_name"
}
},
"facets": {
"sightingWeekFacet": {
"type": "date",
"path": "sighting",
"boundaries": [ISODate("2022-01-01"),
ISODate("2022-01-08"),
ISODate("2022-01-15"),
ISODate("2022-01-22")],
"default" : "other"
}
}
}
}
"facet" is an operator within $searchMeta. "operator" refers to the search operator - the query itself. "facets" operator is where we put the definition of the buckets for the facets.
Lab #1: Creating a new field mapping using facets
# app/search_index.json
{
"name": "sample_supplies-sales-facets",
"searchAnalyzer": "lucene.standard",
"analyzer": "lucene.standard",
"collectionName": "sales",
"database": "sample_supplies",
"mappings": {
"dynamic": true,
"fields": {
"purchaseMethod": [
{
"dynamic": true,
"type": "document"
},
{
"type": "string"
}
],
"storeLocation": [
{
"dynamic": true,
"type": "document"
},
{
"type": "stringFacet"
}
]
}
}
}
# create indexes based off of json
atlas clusters search indexes create --clusterName myAtlasClusterEDU -f /app/search_index.json
# verify index creation
atlas clusters search indexes list --clusterName myAtlasClusterEDU --db sample_supplies --collection sales
Lab #2: Search using $searchMeta, facets & operators
db.sales.aggregate([
{
$searchMeta: {
index: 'sample_supplies-sales-facets',
"facet": {
"operator": {
"text": {
"query": "In store",
"path": "purchaseMethod"
}
},
"facets": {
"locationFacet": {
"type": "string",
"path": "storeLocation",
}
}
}
}
}
])
Transactions = group of db operations that must happen together or not at all, ensuring db safety & consistency
Examples:
- Move funds: mobile payment app
- Track inventory: online shopping
- Track payments: billing software
ACID Transactions:
- tool used to solve the above problems
- group of db operations that are completed as a unit or not at all
A.C.I.D.:
- Atomicity: succeed or fail together
- Consistency: changes made by operations are consistent with db constraints
- IE: Account balance can't drop below $0
- Isolation: multiple transactions can happen at the same time, without affecting the outcome of the other transaction
- Ensures no side effects if other operations are running at the same time
- Durability: changes made by operations in a transaction will persist, no matter what
All single document operations are atomic.
- No extra steps are needed to provide ACID properties
updateOne()
= ACID transaction. all fields will update or none at all.updateMany()
= IS NOT ACID transaction.
Multi-document ACID Transaction:
- Requires careful planning and execution
- MongoDB "locks" all resources involved in a transaction
- Incur performance cost & affects latency
- Precise tool for special circumstances
Using Mongo Shell:
- Session: group of db operations that are related to each other and should be run together
- Transaction has a max runtime of one minute after first write
- After 1 min: "MongoServerError: Transaction 1 has been aborted"
Creating a session to transfer 30 from one account to another:
const session = db.getMongo().startSession();
session.startTransaction();
const account = session.getDatabase('bank').getCollection('accounts');
account.updateOne({ account_id: "123" }, { $inc: { balance: -30 } });
account.updateOne({ account_id: "321" }, { $inc: { balance: 30 }});
# cancel the transaction
session.abortTransaction();
# complete the transaction
session.commitTransaction();
Generic instructions:
const session = db.getMongo().startSession()
session.startTransaction()
const account = session.getDatabase('< add database name here>').getCollection('<add collection name here>')
# Add database operations like .updateOne() here
session.commitTransaction()
Lab #1:
# Open a new session by creating a session variable that will store the session object with the following command:
const session = db.getMongo().startSession();
# Start a transaction on the newly created session variable with the following command:
session.startTransaction();
# Create an account variable to reference the accounts collection with the following command:
const account = session.getDatabase('bank').getCollection('accounts');
# Insert a new account for account_holder "Florence Taylor" with an account_id of "MDB454252264",
# an account_type of "savings", a balance of 100.00, and an empty transfers_complete array.
# You can do this with the following command:
account.insertOne({
account_id: "MDB454252264",
account_holder: "Florence Taylor",
account_type: "savings",
balance: 100.0,
transfers_complete: [],
last_updated: new Date()
})
# Fund the account with 100.00 from a checking account with an account_id of "MDB963134500" that belongs
# to the same account holder. You can do this with the following command:
account.updateOne( { account_id: "MDB963134500" }, {$inc: { balance: -100.00 }})
# Complete the transaction on the session using the following command:
session.commitTransaction()