Skip to content

Latest commit

 

History

History

update-json-index

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Updating a JSON Index

In this recipe we'll learn how to update a JSON index and have the new config applied to existing data.

Pinot Version 1.0.0
Schema config/schema.json
Initial Table Config config/table.json
Table Config with no index config/table-no-index.json
Updated Table Config config/table-updated-index.json

This is the code for the following recipe: https://dev.startree.ai/docs/pinot/recipes/json-update-index


git clone [email protected]:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/update-json-index

Spin up a Pinot cluster using Docker Compose:

docker-compose up

Add tables and schema:

docker run \
   --network jsonindex \
   -v $PWD/config:/config \
   apachepinot/pinot:1.0.0 AddTable \
     -schemaFile /config/schema.json \
     -tableConfigFile /config/table.json \
     -controllerHost "pinot-controller-jsonindex" \
    -exec

Import messages into Kafka:

pip install faker
python datagen.py --sleep 0.0001 2>/dev/null |
jq -cr --arg sep ø '[.uuid, tostring] | join($sep)' |
kcat -P -b localhost:9092 -t people -Kø

Query Pinot:

select *
from people 
WHERE JSON_MATCH(person, '"$.address.state"=''Kentucky''')
limit 10
select count(*)
from people 
WHERE JSON_MATCH(person, '"$.address.state" <> ''Kentucky''')
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people 
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people 
WHERE JSON_MATCH(person, '"$.address.state" NOT IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select count(*)
from people 
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'')')
select count(*)
from people 
WHERE JSON_MATCH(person, '"$.interests[0]" = ''Swimming''')

Clear JSON index and refresh all segments:

docker run \
   --network jsonindex \
   -v $PWD/config:/config \
   apachepinot/pinot:1.0.0 AddTable \
     -schemaFile /config/schema.json \
     -tableConfigFile /config/table-no-index.json \
     -controllerHost "pinot-controller-jsonindex" \
    -exec -update
curl -X 'POST' 'http://localhost:9000/segments/people_REALTIME/reload?type=REALTIME' 

Update with new JSON index and refresh all segments:

docker run \
   --network jsonindex \
   -v $PWD/config:/config \
   apachepinot/pinot:1.0.0 AddTable \
     -schemaFile /config/schema.json \
     -tableConfigFile /config/table-updated-index.json \
     -controllerHost "pinot-controller-jsonindex" \
    -exec -update
curl -X 'POST' 'http://localhost:9000/segments/people_REALTIME/reload?type=REALTIME' 

Query Pinot again:

This query will return results:

select count(*)
from people 
WHERE JSON_MATCH(person, '"$.address.country" = ''Croatia''')

But this one won't because that field isn't indexed anymore:

select count(*)
from people 
WHERE JSON_MATCH(person, '"$.interests[0]" = ''Swimming''')