Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: How to get the size occupied by indexes #14779

Open
Vasu7052 opened this issue Jan 8, 2025 · 3 comments
Open

Query: How to get the size occupied by indexes #14779

Vasu7052 opened this issue Jan 8, 2025 · 3 comments

Comments

@Vasu7052
Copy link

Vasu7052 commented Jan 8, 2025

Hi Team,

We have an events table in our Apache Pinot Cluster and around 1M rows present in it.

We have applied several indexes on this events table so we wanted to know how much size on disk/memory all the indexes are using so that we can tweak them accordingly.

# Pinot Server Data Directory
pinot.server.instance.dataDir=/mnt/data/apache-pinot-data/

# Pinot Server Temporary Segment Tar Directory
pinot.server.instance.segmentTarDir=/mnt/data/apache-pinot-data/segmentTar

Apache Pinot version: 1.2.0

@Jackie-Jiang
Copy link
Contributor

There are 2 ways to get that info:

  1. Use the segment metadata controller rest API, which should be able to return the per index type size info
  2. Open the local segment on server, and the size info is stored in index_map file

@Vasu7052
Copy link
Author

Vasu7052 commented Jan 9, 2025

Here are the APIs that i tried:
1.

curl -X 'GET' \
  'http://localhost:9000/segments/events_v1/metadata?type=OFFLINE' \
  -H 'accept: application/json'
output:
{
  "events_v1_OFFLINE_1733600231086_1737667483443_0": {
    "segmentName": "events_v1_OFFLINE_1733600231086_1737667483443_0",
    "schemaName": null,
    "crc": 992603588,
    "creationTimeMillis": 1736333325493,
    "creationTimeReadable": "2025-01-08T10:48:45:493 UTC",
    "timeColumn": "time",
    "timeUnit": "MILLISECONDS",
    "timeGranularitySec": 0,
    "startTimeMillis": 1733600231086,
    "startTimeReadable": "2024-12-07T19:37:11.086Z",
    "endTimeMillis": 1737667483443,
    "endTimeReadable": "2025-01-23T21:24:43.443Z",
    "segmentVersion": "v3",
    "creatorName": null,
    "totalDocs": 500109,
    "custom": {
      "input.data.file.uri": "file:/mnt/data/...output.csv"
    },
    "startOffset": null,
    "endOffset": null,
    "columns": [],
    "indexes": {},
    "star-tree-index": null
  }
}
curl -X 'GET' \
  'http://localhost:9000/segments/events_v1/events_v1_OFFLINE_1733600231086_1737667483443_0/metadata' \
  -H 'accept: application/json'
{
  "segment.start.time": "1733600231086",
  "segment.time.unit": "MILLISECONDS",
  "segment.size.in.bytes": "20886469",
  "segment.end.time": "1737667483443",
  "segment.total.docs": "500109",
  "segment.creation.time": "1736333325493",
  "segment.push.time": "1736333329509",
  "segment.end.time.raw": "1737667483443",
  "segment.start.time.raw": "1733600231086",
  "segment.index.version": "v3",
  "custom.map": "{\"input.data.file.uri\":\"file:/mnt/data/.../output.csv\"}",
  "segment.crc": "992603588",
  "segment.download.url": "http://controller:9000/segments/events_v1/events_v1_OFFLINE_1733600231086_1737667483443_0"
}

None of these APIs are giving index information:
Here's the first few lines of index_map file

Platform.dictionary.startOffset = 0
Platform.dictionary.size = 29
Platform.forward_index.startOffset = 29
Platform.forward_index.size = 125036
account_id.dictionary.startOffset = 125065
account_id.dictionary.size = 12
account_id.forward_index.startOffset = 125077
account_id.forward_index.size = 16

@Jackie-Jiang
Copy link
Contributor

@Vasu7052 You want to put columns query parameter. Put * if you want to read index info for all columns.
From index_map, you also can tell the indexes and their sizes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants