In this recipe we'll learn how to update a JSON index and have the new config applied to existing data.
Pinot Version | 1.0.0 |
Schema | config/schema.json |
Initial Table Config | config/table.json |
Table Config with no index | config/table-no-index.json |
Updated Table Config | config/table-updated-index.json |
This is the code for the following recipe: https://dev.startree.ai/docs/pinot/recipes/json-update-index
git clone [email protected]:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/update-json-index
Spin up a Pinot cluster using Docker Compose:
docker-compose up
Add tables and schema:
docker run \
--network jsonindex \
-v $PWD/config:/config \
apachepinot/pinot:1.0.0 AddTable \
-schemaFile /config/schema.json \
-tableConfigFile /config/table.json \
-controllerHost "pinot-controller-jsonindex" \
-exec
Import messages into Kafka:
pip install faker
python datagen.py --sleep 0.0001 2>/dev/null |
jq -cr --arg sep ø '[.uuid, tostring] | join($sep)' |
kcat -P -b localhost:9092 -t people -Kø
Query Pinot:
select *
from people
WHERE JSON_MATCH(person, '"$.address.state"=''Kentucky''')
limit 10
select count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" <> ''Kentucky''')
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" NOT IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'')')
select count(*)
from people
WHERE JSON_MATCH(person, '"$.interests[0]" = ''Swimming''')
Clear JSON index and refresh all segments:
docker run \
--network jsonindex \
-v $PWD/config:/config \
apachepinot/pinot:1.0.0 AddTable \
-schemaFile /config/schema.json \
-tableConfigFile /config/table-no-index.json \
-controllerHost "pinot-controller-jsonindex" \
-exec -update
curl -X 'POST' 'http://localhost:9000/segments/people_REALTIME/reload?type=REALTIME'
Update with new JSON index and refresh all segments:
docker run \
--network jsonindex \
-v $PWD/config:/config \
apachepinot/pinot:1.0.0 AddTable \
-schemaFile /config/schema.json \
-tableConfigFile /config/table-updated-index.json \
-controllerHost "pinot-controller-jsonindex" \
-exec -update
curl -X 'POST' 'http://localhost:9000/segments/people_REALTIME/reload?type=REALTIME'
Query Pinot again:
This query will return results:
select count(*)
from people
WHERE JSON_MATCH(person, '"$.address.country" = ''Croatia''')
But this one won't because that field isn't indexed anymore:
select count(*)
from people
WHERE JSON_MATCH(person, '"$.interests[0]" = ''Swimming''')