jq
is unbeatable for JSON processing, but its syntax requires a lot of learning.
jonq wraps jq
in a SQL-lish/pythonic layer you can read and remember.
Who It's For: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.
- Robust nested‑array handling – automatic base‑array detection (
products[].versions[]
) - Expression parser fixes – arithmetic around aggregations now works (
sum(x) * 2
) - Null‑safe aggregation – avoids
Cannot iterate over null
errors - jonq_fast - 2x faster than normal jonq, but optional
Category | What you can do | Example |
---|---|---|
Selection | Pick fields | select name, age |
Wildcard | All fields | select * |
Filtering | Python‑style opsand / or / between / contains |
if age > 30 and city = 'NY' |
Aggregations | sum avg min max count |
select avg(price) as avg_price |
Grouping | group by + having |
… group by city having count > 2 |
Ordering | sort <field> [asc|desc] <limit> |
sort age desc 5 |
Nested arrays | from [].orders or inline paths |
select products[].name … |
Inline maths | Real expressions | sum(items.price) * 2 as double_total |
CSV / stream | --format csv , --stream |
Task | Raw jq filter | jonq one‑liner |
---|---|---|
Select specific fields | jq '.[]|{name:.name,age:.age}' |
jonq data.json "select name, age" |
Filter rows | jq '.[]|select(.age > 30)|{name,age}' |
… "select name, age if age > 30" |
Sort + limit | `jq 'sort_by(.age) | reverse |
Deep filter | jq '.[]|select(.profile.address.city=="NY")|{name,city:.profile.address.city}' |
… "select name, profile.address.city if profile.address.city = 'NY'" |
Count items | `jq 'map(select(.age>25)) | length'` |
Group & count | `jq 'group_by(.city) | map({city:.[0].city,count:length})'` |
Complex boolean | `jq '.[] | select(.age>25 and (.city=="NY" or .city=="Chicago"))'` |
Group & HAVING | `jq 'group_by(.city) | map(select(length>2)) |
Aggregation expression | - | … "select sum(price) * 1.07 as total_gst" |
Nested‑array aggregation | - | … "select avg(products[].versions[].pricing.monthly) as avg_price" |
Take‑away: a single jonq
string replaces many pipes and brackets while still producing pure jq under the hood.
Aspect | jonq | DuckDB | Pandas |
---|---|---|---|
Primary Use Case | Fast, lightweight JSON querying directly from the command line | General-purpose data manipulation and analysis in Python | Analytical SQL queries on large datasets, including JSON |
Setup | No DB, streams any JSON | Requires DB file / extension | Requires a Python environment with pandas and its dependencies installed |
Query language | Familiar SQL‑ish, no funky json_extract |
SQL + JSON functions | Python code for data manipulation and analysis |
Footprint | Minimal: requires only jq (a ~500 KB binary); no environment setup | ~ 140 MB binary | Larger: ~20 MB for pandas and its dependencies |
Streaming | --stream processes line‑delimited JSON lazily |
Must load into table | Can process large files using chunking, but not as memory-efficient as streaming |
Memory Usage | Low; streams data to avoid loading full JSON into memory | In-memory database, but optimized for large data with columnar storage | Loads data into memory; can strain RAM with large datasets |
jq ecosystem | Leverages all jq filters for post‑processing | No | Part of the Python data science ecosystem; integrates with NumPy, Matplotlib, scikit-learn, etc |
- Instant JSON Querying, No Setup Hassle
You have a JSON file (data.json) and need to extract all records where age > 30 in seconds.
-
With
jonq
: Runjonq "SELECT * FROM data.json WHERE age > 30"
. Done. No environment setup, no imports—just install jq and go. -
Pandas: Fire up Python, write a script (
import pandas as pd; df = pd.read_json('data.json'); df[df['age'] > 30]
), and run it. More steps. -
DuckDB: Set up a database, load the JSON (
SELECT * FROM read_json('data.json') WHERE age > 30
), and execute. Powerful, but overkill for a quick task.
- Command-Line Power
Use Case: Chain commands in a pipeline, like cat data.json | jonq "SELECT name, age FROM stdin" | grep "John".
Jonq
thrives in shell scripts or CI/CD workflows. Pandas and DuckDB require scripting or a heavier integration layer.
- Lightweight and Efficient
Jonq
uses jq’s
streaming mode (--stream
) for large JSON files, processing data piece-by-piece instead of loading it all into memory.
Comparison: Pandas loads everything into a DataFrame (RAM-intensive), and while DuckDB is memory-efficient for analytics, it’s still a full database engine, thus there'll be significant overhead.
- SQL Simplicity for JSON
Example: jonq "SELECT name, email FROM users.json WHERE status = 'active' ORDER BY name"
.
Advantage: If you know SQL, "jonq" feels natural for JSON—no need to learn jq’s super difficult syntax.
- Speed for Ad-Hoc Tasks
Test Case: Querying a 1 GB JSON file for specific fields.
-
Jonq: Streams it in seconds with minimal memory use.
-
Pandas: Might choke or require chunking hacks.
-
DuckDB: Fast, but setup and SQL complexity add time.
Supported Platforms: Jonq works on Linux, macOS, and Windows with WSL.
- Python 3.9+
jq
command line tool installed (https://stedolan.github.io/jq/download/)
From PyPI
pip install jonq # latest stable
From source
git clone https://github.com/duriantaco/jonq.git
cd jonq && pip install -e .
Verify Installation: After installation, run jonq --version
to ensure it's working correctly.
For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast
Rust extension.
pip install jonq-fast
We will explain more about this down below
echo '[{"name":"Alice","age":30},{"name":"Bob","age":25}]' > data.json
jonq data.json "select name, age if age > 25"
# Output: [{"name":"Alice","age":30}]
The query syntax follows a simplified format:
select <fields> [if <condition>] [sort <field> [asc|desc] [limit]]
where:
<fields>
- Comma-separated list of fields to select or aggregationsif <condition>
- Optional filtering conditiongroup by <fields>
- Optional grouping by one or more fieldssort <field>
- Optional field to sort byasc|desc
- Optional sort direction (default: asc)limit
- Optional integer to limit the number of results
You can also refer to the json_test_files
for the test jsons and look up USAGE.md
guide. Anyway let's start with simple.json
.
Imagine a json like the following:
[
{ "id": 1, "name": "Alice", "age": 30, "city": "New York" },
{ "id": 2, "name": "Bob", "age": 25, "city": "Los Angeles" },
{ "id": 3, "name": "Charlie", "age": 35, "city": "Chicago" }
]
jonq path/to/simple.json "select *"
jonq path/to/simple.json "select name, age"
jonq path/to/simple.json "select name, age if age > 30"
jonq path/to/simple.json "select name, age sort age desc 2"
jonq path/to/simple.json "select sum(age) as total_age"
jonq path/to/simple.json "select avg(age) as average_age"
jonq path/to/simple.json "select count(age) as count"
Simple enough i hope? Now let's move on to nested jsons
Imagine a nested json like below:
[
{
"id": 1,
"name": "Alice",
"profile": {
"age": 30,
"address": { "city": "New York", "zip": "10001" }
},
"orders": [
{ "order_id": 101, "item": "Laptop", "price": 1200 },
{ "order_id": 102, "item": "Phone", "price": 800 }
]
},
{ "id": 2, "name": "Bob", "profile": { "age": 25, "address": { "city": "Los Angeles", "zip": "90001" } }, "orders": [ { "order_id": 103, "item": "Tablet", "price": 500 } ] }
]
# nested field access
jonq nested.json "select name, profile.age"
jonq nested.json "select name, profile.address.city"
# count array elements
jonq nested.json "select name, count(orders) as order_count"
# boolean logic (AND / OR / parentheses)
jonq nested.json "select name if profile.address.city = 'New York' or orders[0].price > 1000"
jonq nested.json "select name if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
jonq nested.json "select name, profile.age if profile.address.city = 'New York' or orders[0].price > 1000"
### Find users who are both under 30 **and** from Los Angeles
jonq nested.json "select name, profile.age if profile.age < 30 and profile.address.city = 'Los Angeles'"
### Using parentheses for complex logic
jonq nested.json "select name, profile.age if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
jonq can output results in CSV format using the --format csv
or -f csv
option:
jonq path/to/simple.json "select name, age" --format csv > output.csv
Using flatten_json in your code:
from jonq.csv_utils import flatten_json
import csv
data = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}
flattened = flatten_json(data, sep=".")
print(flattened)
For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast
Rust extension for significantly improved performance.
Once installed, you can use jonq_fast from the command line with the --fast
or -F
flag:
jonq data.json "select name, age" --format csv --fast > output.csv
This flag improves performance when converting to CSV format by using a faster JSON flattening implementation. The performance benefit is most noticeable with large or deeply nested JSON structures.
You can also use jonq_fast directly in your Python code:
import jonq_fast
import csv
# fake data
data = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}
flattened = jonq_fast.flatten(data, ".")
print(flattened)
# Output: {
# "user.name": "Alice",
# "user.address.city": "New York",
# "user.orders.0.id": 1,
# "user.orders.0.item": "Laptop",
# "user.orders.0.price": 1200,
# "user.orders.1.id": 2,
# "user.orders.1.item": "Phone",
# "user.orders.1.price": 800
# }
## write to your csv here
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(flattened.keys())
writer.writerow(flattened.values())
For processing large JSON files efficiently, jonq supports streaming mode with the --stream
or -s
option:
jonq path/to/large.json "select name, age" --stream
- Make sure jq is installed on your system
- Verify jq is in your PATH by running
jq --version
- Install jq: https://stedolan.github.io/jq/download/
- Check your JSON file for syntax errors
- Verify the file exists and is readable
- Use a JSON validator to check your file structure
- Verify your query follows the correct syntax format
- Ensure field names match exactly what's in your JSON
- Check for missing quotes around string values in conditions
- Verify your condition isn't filtering out all records
- Check if your field names match the casing in the JSON
- For nested fields, ensure the dot notation path is correct
- Performance: For very large JSON files (100MB+), processing may be slow.
- Advanced jq Features: Some advanced jq features aren't exposed in the jonq syntax.
- Multiple File Joins: No support for joining data from multiple JSON files.
- Custom Functions: User-defined functions aren't supported in the current version.
- Date/Time Operations: Limited support for date/time parsing or manipulation.
Pandas: Go here for complex analysis (e.g., merging datasets, statistical ops, plotting). Jonq
won’t crunch numbers or integrate with machine learning libraries.
DuckDB: Pick this for big data analytics with joins, aggregates, or window functions across multiple files. Jonq
is simpler, not a database.
Docs here: https://jonq.readthedocs.io/en/latest/
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- jq: This tool depends on the jq command-line JSON processor, which is licensed under the MIT License. jq is copyright (C) 2012 Stephen Dolan.
The jq tool itself is not included in this package - users need to install it separately.