The goal of this POC was to test the implementation and performance of storing JSON data in Postgrsql 9.4+ vs using native JSON structures in MongoDB.
We generate 10k documents in both MongoDB and Postgresql using Spring Boot + JPA + Hibernate and add appropriate indexes.
- Install local Postgres (with database "postgres")
- Install local MongoDB (with database "PgPerf")
- Install DynamoDB Local
- Run
mvn compile
to build QueryDSL sources - Run
mvn test
to run performance tests
wget https://d1.almworks.com/.files/sqlite4java/sqlite4java-392.zip
unzip sqlite4java-392.zip
sudo cp sqlite4java-392/*.dylib /Library/Java/Extensions/
CREATE INDEX ON example USING BTREE ((data->>'stock'));
CREATE INDEX ON example USING HASH ((data->>'stock'));
CREATE INDEX ON example USING GIN ((data));
CREATE INDEX ON example USING BTREE (cast (data->>'stock' as int));
Test | Time taken |
---|---|
testPerfMongo | 23.519s |
testPerfPg | 54.798s |
In MongoDB, documents look like:
In Postgresql, rows look like:
While PostgreSQL JSONB type provides flexibility, it should be used just when appropriate. The only check being performed is that stored data is actually in a valid JSON format. You cannot impose any other constraints as with regular columns - such as not null or enforce a particular Data Type (Integer, VarChar, Date). Therefore it is best suited for providing an additional optional set of data to an entity, where you cannot be sure before which data is would contain. And such data would differ a lot among each of the rows. Such example can be a user-provided set of additional data. You should always carefully consider which data is better suited as regular columns and which should be stored as JSON.
jsonb
columns have a flat 1% statistics rate causing poor lookup strategies (unlike MongoDB)
Consider
EXPLAIN ANALYZE SELECT *
FROM example
WHERE to_date(data->>'date', 'YYYY-MM-DD')
BETWEEN '2018-02-01'
AND '2020-03-01'
AND data->>'name' = 'Name 7';
or even
EXPLAIN analyze SELECT * FROM example
WHERE ((data->>'stock')::integer) > 15000;
The results should that neither approach can use indexes for range queries - a filter need to be run.
Consider
EXPLAIN ANALYZE SELECT * FROM example
WHERE ((data->>'stock')::integer) = 15000;
The casting as integer
means that a table scan and filter is performed - no index.
However, you can do:
CREATE INDEX ON example USING BTREE (cast (data->>'stock' as int));
which does seem to create a usable index for the above query.
WHERE DATE > ? AND FEATURES.Ref = ?
Some code forked from https://www.vojtechruzicka.com/postgresqls-jsonb-type-mapping-using-hibernate/
Official docs:
- Spring Data MongoDB
- Details on Postgres's GIN Indexes
- JSON Types
- JSON Functions and Operators
- GiST and GIN Index Types
Postgresql articles:
- About GIN internals
- About index sizes
- About index statistics
- About indexes as text
- Checking existence of a key
- Compares Posgresql JSON to JSONB data types
- When To Avoid JSONB In A PostgreSQL Schema
DynamoDB articles: