Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Towards Measurable Data Usage #97

Open
2 of 9 tasks
wunder957 opened this issue Nov 8, 2023 · 1 comment
Open
2 of 9 tasks

EPIC: Towards Measurable Data Usage #97

wunder957 opened this issue Nov 8, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@wunder957
Copy link
Contributor

wunder957 commented Nov 8, 2023

Problems

This project was initially directed towards unplugged detection of data usage behaviour through eBPF technology. I'm glad we've initially implemented a framework for it. But want to make the probe results available to other applications (e.g. the Data Usage Controller of DataUcon project), we need to expose the results of our recording in a machine-readable format.

On the other hand, we need to finish standardising the storage side of things, and for large numbers of events, a traditional SQL database is not a good choice.

We don't yet have a good production example to represent our capabilities.

Status quo and Future

Relationship with storage back-end

OpenTelemetry is sought after by related projects as an open source standard for observability. We believe that although our project is far from observability in terms of observables, goals, and functions. However, our project is similar to OpenTelemetry related projects in terms of technical implementation, and we should be able to benefit from the development of OpenTelemtry and related backends.

As the project has evolved, we have completed the integration with OpenTelemetry: #82. Next, we will make OpenTelemetry our primary support, and SQL databases MAY NOT be actively maintained.

We are currently using jaeger as the first backend to access the.

Cloud-native support

We will natively support monitoring of containers on the cloud, so let's start with the docker and k8s.

How to expose data

We will first build a querier for the jaeger backend to restore the tracer data from the backend, and then implement an analytics engine that can form an analysis of the tracer data to derive a picture of how the process is using the data. We will refer to this process as the measurement of data usage

  • Designing and implementing an analysis engine
  • Docs: 4W1H of Data Usage and Our Measurement Capabilities

production example

We previously accepted a machine learning case for MNIST that included analysis and associated probing points for data usage behaviours: #84, and I thought we could start with this case to demonstrate our data usage measurement capabilities

Other maintenance

Instead of (at least not in the near future) splitting the project into a queryer and a detector, we'll build two different images based on the same Python package(duetector). We already have a different CLI entry point, so I'm sure this won't be difficult.

In addition, we need to optimise the README document and the design document a bit, assuming the backend to be OpenTelemetry

  • Splitting container images
  • Docs: Switch to OTel backend

Roadmap

This EPIC will be released as version 1.0.0, prior to which the features described above will be integrated as version 0.x.y and in a gradual development process.

Regarding data use measurability, I am working on some related blogs (in Chinese).

@wunder957 wunder957 added the enhancement New feature or request label Nov 8, 2023
@wunder957 wunder957 added this to the v0.2.0 milestone Nov 8, 2023
@wunder957 wunder957 self-assigned this Nov 8, 2023
@wunder957 wunder957 pinned this issue Nov 8, 2023
@wunder957 wunder957 modified the milestones: v0.2.0, v1.0.0 Nov 27, 2023
@Wh1isper
Copy link
Collaborator

Due to personal reasons I(aka @wunder957 ) will be leaving the project for a while, there is no one actively maintaining the project at the moment, if you are interested in getting involved feel free to contact me or any member of hitsz-ids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants