jonq

Human‑readable syntax for jq

jq is unbeatable for JSON processing, but its syntax requires a lot of learning.
jonq wraps jq in a SQL-lish/pythonic layer you can read and remember.

Who It's For: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.

✨ What’s new in 0.0.2

Robust nested‑array handling – automatic base‑array detection (products[].versions[])
Expression parser fixes – arithmetic around aggregations now works (sum(x) * 2)
Null‑safe aggregation – avoids Cannot iterate over null errors
jonq_fast - 2x faster than normal jonq, but optional

Features at a glance

Category	What you can do	Example
Selection	Pick fields	`select name, age`
Wildcard	All fields	`select *`
Filtering	Python‑style ops `and / or / between / contains`	`if age > 30 and city = 'NY'`
Aggregations	`sum avg min max count`	`select avg(price) as avg_price`
Grouping	`group by` + `having`	`… group by city having count > 2`
Ordering	`sort <field> [asc\|desc] <limit>`	`sort age desc 5`
Nested arrays	`from [].orders` or inline paths	`select products[].name …`
Inline maths	Real expressions	`sum(items.price) * 2 as double_total`
CSV / stream	`--format csv`, `--stream`

Why Jonq?

Jonq vs raw jq

Task	Raw jq filter	jonq one‑liner
Select specific fields	`jq '.[]\|{name:.name,age:.age}'`	`jonq data.json "select name, age"`
Filter rows	`jq '.[]\|select(.age > 30)\|{name,age}'`	`… "select name, age if age > 30"`
Sort + limit	`jq 'sort_by(.age)	reverse
Deep filter	`jq '.[]\|select(.profile.address.city=="NY")\|{name,city:.profile.address.city}'`	`… "select name, profile.address.city if profile.address.city = 'NY'"`
Count items	`jq 'map(select(.age>25))	length'`
Group & count	`jq 'group_by(.city)	map({city:.[0].city,count:length})'`
Complex boolean	`jq '.[]	select(.age>25 and (.city=="NY" or .city=="Chicago"))'`
Group & HAVING	`jq 'group_by(.city)	map(select(length>2))
Aggregation expression	-	`… "select sum(price) * 1.07 as total_gst"`
Nested‑array aggregation	-	`… "select avg(products[].versions[].pricing.monthly) as avg_price"`

Take‑away: a single jonq string replaces many pipes and brackets while still producing pure jq under the hood.

Jonq vs DuckDB vs Pandas (JSON extension)

Aspect	jonq	DuckDB	Pandas
Primary Use Case	Fast, lightweight JSON querying directly from the command line	General-purpose data manipulation and analysis in Python	Analytical SQL queries on large datasets, including JSON
Setup	No DB, streams any JSON	Requires DB file / extension	Requires a Python environment with pandas and its dependencies installed
Query language	Familiar SQL‑ish, no funky `json_extract`	SQL + JSON functions	Python code for data manipulation and analysis
Footprint	Minimal: requires only jq (a ~500 KB binary); no environment setup	~ 140 MB binary	Larger: ~20 MB for pandas and its dependencies
Streaming	`--stream` processes line‑delimited JSON lazily	Must load into table	Can process large files using chunking, but not as memory-efficient as streaming
Memory Usage	Low; streams data to avoid loading full JSON into memory	In-memory database, but optimized for large data with columnar storage	Loads data into memory; can strain RAM with large datasets
jq ecosystem	Leverages all jq filters for post‑processing	No	Part of the Python data science ecosystem; integrates with NumPy, Matplotlib, scikit-learn, etc

Why you’ll reach for Jonq

Instant JSON Querying, No Setup Hassle

You have a JSON file (data.json) and need to extract all records where age > 30 in seconds.

With jonq: Run jonq "SELECT * FROM data.json WHERE age > 30". Done. No environment setup, no imports—just install jq and go.
Pandas: Fire up Python, write a script (import pandas as pd; df = pd.read_json('data.json'); df[df['age'] > 30]), and run it. More steps.
DuckDB: Set up a database, load the JSON (SELECT * FROM read_json('data.json') WHERE age > 30), and execute. Powerful, but overkill for a quick task.

Command-Line Power

Use Case: Chain commands in a pipeline, like cat data.json | jonq "SELECT name, age FROM stdin" | grep "John".

Jonq thrives in shell scripts or CI/CD workflows. Pandas and DuckDB require scripting or a heavier integration layer.

Lightweight and Efficient

Jonq uses jq’s streaming mode (--stream) for large JSON files, processing data piece-by-piece instead of loading it all into memory.

Comparison: Pandas loads everything into a DataFrame (RAM-intensive), and while DuckDB is memory-efficient for analytics, it’s still a full database engine, thus there'll be significant overhead.

SQL Simplicity for JSON

Example: jonq "SELECT name, email FROM users.json WHERE status = 'active' ORDER BY name".

Advantage: If you know SQL, "jonq" feels natural for JSON—no need to learn jq’s super difficult syntax.

Speed for Ad-Hoc Tasks

Test Case: Querying a 1 GB JSON file for specific fields.

Jonq: Streams it in seconds with minimal memory use.
Pandas: Might choke or require chunking hacks.
DuckDB: Fast, but setup and SQL complexity add time.

Installation

Supported Platforms: Jonq works on Linux, macOS, and Windows with WSL.

Prerequisites

Python 3.9+
jq command line tool installed (https://stedolan.github.io/jq/download/)

Setup

From PyPI

pip install jonq # latest stable

From source

git clone https://github.com/duriantaco/jonq.git
cd jonq && pip install -e .

Verify Installation: After installation, run jonq --version to ensure it's working correctly.

Optional: Rust-powered extension

For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast Rust extension.

pip install jonq-fast

We will explain more about this down below

Quick Start

Create a simple JSON file

echo '[{"name":"Alice","age":30},{"name":"Bob","age":25}]' > data.json

Run a query

jonq data.json "select name, age if age > 25"

# Output: [{"name":"Alice","age":30}]

Query Syntax

The query syntax follows a simplified format:

select <fields> [if <condition>] [sort <field> [asc|desc] [limit]]

where:

<fields> - Comma-separated list of fields to select or aggregations
if <condition> - Optional filtering condition
group by <fields> - Optional grouping by one or more fields
sort <field> - Optional field to sort by
asc|desc - Optional sort direction (default: asc)
limit - Optional integer to limit the number of results

Example Simple JSON

You can also refer to the json_test_files for the test jsons and look up USAGE.md guide. Anyway let's start with simple.json.

Imagine a json like the following:

[
  { "id": 1, "name": "Alice",   "age": 30, "city": "New York"    },
  { "id": 2, "name": "Bob",     "age": 25, "city": "Los Angeles" },
  { "id": 3, "name": "Charlie", "age": 35, "city": "Chicago"     }
]

To select all fields:

jonq path/to/simple.json "select *"

Select specific fields:

jonq path/to/simple.json "select name, age"

Filter with conditions:

jonq path/to/simple.json "select name, age if age > 30"

Sorting:

jonq path/to/simple.json "select name, age sort age desc 2"

Aggregation:

jonq path/to/simple.json "select sum(age) as total_age"
jonq path/to/simple.json "select avg(age) as average_age"
jonq path/to/simple.json "select count(age) as count"

Simple enough i hope? Now let's move on to nested jsons

Example with Nested JSON

Imagine a nested json like below:

[
  {
    "id": 1,
    "name": "Alice",
    "profile": {
      "age": 30,
      "address": { "city": "New York", "zip": "10001" }
    },
    "orders": [
      { "order_id": 101, "item": "Laptop", "price": 1200 },
      { "order_id": 102, "item": "Phone",  "price": 800  }
    ]
  },
  { "id": 2, "name": "Bob", "profile": { "age": 25, "address": { "city": "Los Angeles", "zip": "90001" } }, "orders": [ { "order_id": 103, "item": "Tablet", "price": 500 } ] }
]

Common patterns

# nested field access
jonq nested.json "select name, profile.age"
jonq nested.json "select name, profile.address.city"

# count array elements
jonq nested.json "select name, count(orders) as order_count"

# boolean logic (AND / OR / parentheses)
jonq nested.json "select name if profile.address.city = 'New York' or orders[0].price > 1000"
jonq nested.json "select name if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"

Advanced Filtering with Complex Boolean Expressions

 jonq nested.json "select name, profile.age if profile.address.city = 'New York' or orders[0].price > 1000"

### Find users who are both under 30 **and** from Los Angeles
jonq nested.json "select name, profile.age if profile.age < 30 and profile.address.city = 'Los Angeles'"

### Using parentheses for complex logic
jonq nested.json "select name, profile.age if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"

Output Formats

CSV Output

jonq can output results in CSV format using the --format csv or -f csv option:

jonq path/to/simple.json "select name, age" --format csv  > output.csv

Python code

Using flatten_json in your code:

from jonq.csv_utils import flatten_json
import csv

data = {
    "user": {
        "name": "Alice",
        "address": {"city": "New York"},
        "orders": [
            {"id": 1, "item": "Laptop", "price": 1200},
            {"id": 2, "item": "Phone", "price": 800}
        ]
    }
}

flattened = flatten_json(data, sep=".")

print(flattened)

Optional: Rust-powered extension

For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast Rust extension for significantly improved performance.

Using jonq_fast on the command line:

Once installed, you can use jonq_fast from the command line with the --fast or -F flag:

jonq data.json "select name, age" --format csv --fast > output.csv

This flag improves performance when converting to CSV format by using a faster JSON flattening implementation. The performance benefit is most noticeable with large or deeply nested JSON structures.

Using jonq_fast in your code

You can also use jonq_fast directly in your Python code:

import jonq_fast
import csv 

# fake data
data = {
    "user": {
        "name": "Alice",
        "address": {"city": "New York"},
        "orders": [
            {"id": 1, "item": "Laptop", "price": 1200},
            {"id": 2, "item": "Phone", "price": 800}
        ]
    }
}

flattened = jonq_fast.flatten(data, ".")

print(flattened)
# Output: {
#   "user.name": "Alice", 
#   "user.address.city": "New York",
#   "user.orders.0.id": 1,
#   "user.orders.0.item": "Laptop",
#   "user.orders.0.price": 1200,
#   "user.orders.1.id": 2,
#   "user.orders.1.item": "Phone",
#   "user.orders.1.price": 800
# }

## write to your csv here
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    
    writer.writerow(flattened.keys())
    
    writer.writerow(flattened.values())

Streaming Mode

For processing large JSON files efficiently, jonq supports streaming mode with the --stream or -s option:

jonq path/to/large.json "select name, age" --stream

Troubleshooting

Common Errors

Error: Command 'jq' not found

Make sure jq is installed on your system
Verify jq is in your PATH by running jq --version
Install jq: https://stedolan.github.io/jq/download/

Error: Invalid JSON in file

Check your JSON file for syntax errors
Verify the file exists and is readable
Use a JSON validator to check your file structure

Error: Syntax error in query

Verify your query follows the correct syntax format
Ensure field names match exactly what's in your JSON
Check for missing quotes around string values in conditions

Error: No results returned

Verify your condition isn't filtering out all records
Check if your field names match the casing in the JSON
For nested fields, ensure the dot notation path is correct

Known Limitations

Performance: For very large JSON files (100MB+), processing may be slow.
Advanced jq Features: Some advanced jq features aren't exposed in the jonq syntax.
Multiple File Joins: No support for joining data from multiple JSON files.
Custom Functions: User-defined functions aren't supported in the current version.
Date/Time Operations: Limited support for date/time parsing or manipulation.

Go Tos:

Pandas: Go here for complex analysis (e.g., merging datasets, statistical ops, plotting). Jonq won’t crunch numbers or integrate with machine learning libraries.

DuckDB: Pick this for big data analytics with joins, aggregates, or window functions across multiple files. Jonq is simpler, not a database.

Docs

Docs here: https://jonq.readthedocs.io/en/latest/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Misc.

jq: This tool depends on the jq command-line JSON processor, which is licensed under the MIT License. jq is copyright (C) 2012 Stephen Dolan.

The jq tool itself is not included in this package - users need to install it separately.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
dist		dist
docs		docs
jonq		jonq
jonq_fast		jonq_fast
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CONTRIBUTIONS.md		CONTRIBUTIONS.md
License		License
README.md		README.md
SYNTAX.md		SYNTAX.md
USAGE.md		USAGE.md
build.sh		build.sh
changelog		changelog
pyproject.toml		pyproject.toml
upload.sh		upload.sh

License

duriantaco/jonq

Folders and files

Latest commit

History

Repository files navigation

jonq

Human‑readable syntax for jq

Who It's For: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.

✨ What’s new in 0.0.2

Features at a glance

Why Jonq?

Jonq vs raw jq

Jonq vs DuckDB vs Pandas (JSON extension)

Why you’ll reach for Jonq

Installation

Prerequisites

Setup

Optional: Rust-powered extension

Quick Start

Create a simple JSON file

Run a query

Query Syntax

Example Simple JSON

To select all fields:

Select specific fields:

Filter with conditions:

Sorting:

Aggregation:

Example with Nested JSON

Common patterns

Advanced Filtering with Complex Boolean Expressions

Output Formats

CSV Output

Python code

Optional: Rust-powered extension

Using jonq_fast on the command line:

Using jonq_fast in your code

Streaming Mode

Troubleshooting

Common Errors

Error: Command 'jq' not found

Error: Invalid JSON in file

Error: Syntax error in query

Error: No results returned

Known Limitations

Go Tos:

Docs

Contributing

License

Misc.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages