-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: batch audit logs and ensure they are not lost #1080
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 12675531641Details
💛 - Coveralls |
…ork is not accepting them, else send to network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The implementation looks solid, but I have some questions about the architectural approach:
Audit Log Architecture:
- What drove the decision to use another Parseable instance for audit logs rather than local storage?
- Do we have plans for a dedicated audit logging solution in the future?
Deployment Considerations:
For multi-node setups (e.g., 3 ingest + 1 query nodes):
- What's the recommended log_endpoint configuration?
- Which node should own audit log collection?
- How do we prevent circular logging dependencies?
Performance & Scalability:
- In high-traffic scenarios, how do we handle the additional network load from audit logging?
- Have we considered the impact of inter-service dependencies on system reliability?
Would appreciate your insights on these points to better understand the architectural vision.
.open(log_file_path) | ||
.await | ||
.expect("Failed to open audit log file"); | ||
let buf = serde_json::to_vec(&logs_to_send).expect("Failed to serialize audit logs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it looks like it never happens, but unwrap() can cause panic in production. It would be nice to have proper error handling with context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be an issue in most places are serialization is not a problem for these types. But still we have expects to document why it is safe to assume it won't
} | ||
|
||
// setup the audit log channel | ||
let (audit_log_tx, mut audit_log_rx) = channel(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the initial size is 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to create a "Glienicke Bridge" so that we block at the request level everytime the underlying audit logger is unable to keep-up.
Thank you for the comments @hippalus, let me note that audit logging is not to be considered as beta yet. We are working on developing this as a feature that is still in development and will need a lot of work before we can consider it ready for production. With regards to your questions:
It is an industry practice to send audit logs to an external system, we felt dogfooding would be the least friction option when it comes to doing this.
Parseable already fulfills most of this and we don't see any reason as to why we would build with anything different. Please let us know of any alternatives, we can document them and plan out how to build support. To make ourselves clear, logs should be sent to an instance outside of the cluster.
The log_endpoint should be a separate cluster/standalone instance of parseable.
This should not matter in my opinion, in a multi-node setup the expectation is that a load-balancer should take the decision and logs should end up in the same stream no matter the node it is received at.
Circular logs are not expected as pointed above.
We are only testing the feature and as of our current learnings, batching and writing to file on network slow-down as well as pushing to file when network is down seemed to be the right decision for us to handle high-frequency use-cases.
What do you mean by "inter-service dependencies".
|
Thank you for the clarification @de-sh . I highlighted these questions because I couldn't find explicit documentation or configuration descriptions in the code addressing these architectural decisions. The implementation looks quite solid. I'm particularly looking forward to seeing the Kafka connector integration. Regarding inter-service dependencies, as you've well explained, we can scale Parseable instances behind a load balancer for audit logging purposes. It's great to see this pathway being established. IMHO : It would be nice to add more documentation around these architectural decisions directly in the code, making it easier for future contributors to understand the design choices. It would be nice to have to add docker-compose as an example. |
Fixes #XXXX.
Description
Collects audit logs into batches, pushes as configured based on threshold of time/size and ensures logs are persisted into a file when the network is down. NOTE: Works in a near synchronous manner where logs will not be accepted if write to disk is blocked or when direct push to network is blocked, this may lead to a situation where requests will not respond properly as they try to send audit logs, but this is a known cost.
This PR has: