Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retention fix distributed #770

Conversation

nikhilsinhaparseable
Copy link
Contributor

fix for retention for distributed deployment
fix for stream info api

Fixes #768

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

… mismatch

data directory creation should not happen in case of deployment mismatch
staging should be overwritten in case of new staging
… mismatch

    data directory creation should not happen in case of deployment mismatch
    staging should be overwritten in case of new staging
    default staging and data directory should not be created if env var has different path
Bumps [h2](https://github.com/hyperium/h2) from 0.3.17 to 0.3.24.
- [Release notes](https://github.com/hyperium/h2/releases)
- [Changelog](https://github.com/hyperium/h2/blob/v0.3.24/CHANGELOG.md)
- [Commits](hyperium/h2@v0.3.17...v0.3.24)

---
updated-dependencies:
- dependency-name: h2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Standardise the error message and also use the new logg.ing
domain for short URLs.
…eablehq#634)

Add P_MODE with options `ingest`, `query` and `all`. Default mode is `all`. 
There are still more changes required for both modes to work well. Will be
added in next subsequent PRs.

Fixes parseablehq#617
create parquet file by grouping all arrow files (in staging) for the duration 
provided in env variable P_STORAGE_UPLOAD_INTERVAL also check 
if arrow files vector is not empty, then sort the arrow files and create key 
for parquet file from last file from sorted arrow files vector

Fixes parseablehq#616
Previously in Query Mode, All log stream endpoints were allowed.
But is it better that only ingester is allowed to create streams.

Fixes parseablehq#641
Signed-off-by: Nitish Tiwari <[email protected]>
Earlier, separate scheduler was initialized for each 
stream on load time or whenever retention period is set.
Now, a single scheduler is initialized which checks retention 
config of all the streams and performs the retention cleanup.

Fixes parseablehq#636
…blehq#653)

added const of 60 secs to be used for local to storage sync

fixes parseablehq#651
includes console release v0.4.0
It is better that users generate themselves as needed.

Signed-off-by: Nitish Tiwari <[email protected]>
…snapshot" (parseablehq#666)

Reverts parseablehq#661 because with this change we're backward incompatible with older 
versions where .stream.json doesn't contain retention field.

This reverts commit 121bf01.
Changes does in the PR -
1. adds the first_event_at property (from the min value of p_timestamp of the first parquet file listed in the first manifest file from the snapshot of the stream.json) to the stats api and writes it to the stream.json file at the request of get stats.
2. updates the first_event_at in case of retention

Fixes : parseablehq#587
Eshanatnight and others added 29 commits April 20, 2024 00:20
* fix: stats response

* fix: s3 get objects

Refactor object storage to filter objects by starts_with_pattern
…#734)

* Add node_url field to Cli struct and update related code

* Update required flag for Node URL in CLI

* updated logic to have server address (ip:port) in parquet file name similar to other json files

---------

Co-authored-by: Nikhil Sinha <[email protected]>
* remove staging query from the query result (for distributed)

* Refactor get_schema method to handle missing schema in object storage
Bumps [h2](https://github.com/hyperium/h2) from 0.3.24 to 0.3.26.
- [Release notes](https://github.com/hyperium/h2/releases)
- [Changelog](https://github.com/hyperium/h2/blob/v0.3.26/CHANGELOG.md)
- [Commits](hyperium/h2@v0.3.24...v0.3.26)

---
updated-dependencies:
- dependency-name: h2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix analytics for the cluster
* Add active_ingesters and inactive_ingesters metrics
* updated ingesters' count and event related metrics
This PR adds fixes for 

1. Default role not assigned to the OAuth user if group does not exist
2. Use user name used instead of id

fixes parseablehq#638
fixes parseablehq#868
Signed-off-by: Nitish Tiwari <[email protected]>
fix for staging size metrics not resetting to 0 even when 
local to storage sync is completed and no arrow/parquet 
file is left in staging folder
* Refactor object storage to use filter_func instead of 
starts_with_pattern in get_objects method
* Refactor fetch_schema method to use object storage instead of HTTP requests
* Refactor metadata.rs and storage.rs
* refactor ingest logic
* fetch stream info from store if stream info is not present in memory.
error if stream info does not exist in S3 and memory
1. fixed banner spacing
2. modified server mode: All to Standalone, Ingest to Distributed (Ingest), Query to Distributed (Query)
3. updated server mode in about API response
4. updated logic for env var P_INGESTOR_URL to use HOSTNAME and PORT from env
5. remove put cache api from querier
6. added put cache api to ingestor
7. renamed ingester to ingestor
8. corrected cache flow for ingestors and standalone
9. removed query, other logstream apis for ingestors
also fixed P_INGESTOR_URL fetch from env variables
with this PR, when delete stream is called, querier 
deletes the stream folder from the storage then calls 
delete stream API for each ingestor. Finally, ingestor 
deletes the stream from its local map
This PR ensures all metadata and data files (json and parquet) use 
a simple sha256 based hash name mechanism. Each ingestor 
allocates itself a unique hash which is used in all file names
relevant to that ingestor. This hash is persisted in metadata file
content also and is supposed to be the same for the lifecycle 
of the ingestor.
---------

Co-authored-by: Nikhil Sinha <[email protected]>
caching for distributed mode can be 
enabled from querier UI. Querier calls the PUT 
/cache API to all ingestors. An ingestor, first checks 
if stream exists, if not found in local map, checks 
in S3 and creates stream. Then checks if caching 
env vars are set. If yes, add cache_enabled flag to 
STREAM_INFO and update its stream.json in S3

Fixes: parseablehq#764
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix: Retention for Distributed/Standalone
5 participants