-
Notifications
You must be signed in to change notification settings - Fork 139
Introduce jvector-apis module and status tracker #537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Before you submit for review:
If you did not complete any of these, then please explain below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation, that is appreciated. Commented a few minor concerns but overall looks very strong
jvector-apis/src/main/java/io/github/jbellis/jvector/status/StatusContext.java
Outdated
Show resolved
Hide resolved
jvector-apis/src/main/java/io/github/jbellis/jvector/status/StatusContext.java
Outdated
Show resolved
Hide resolved
internal-apis/src/main/java/io/github/jbellis/jvector/status/StatusTracker.java
Show resolved
Hide resolved
| </configuration> | ||
| </plugin> | ||
| <plugin> | ||
| <groupId>org.codehaus.mojo</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added before by a dev that I follow up with before removing it. This wasn't removing the plugin entirely, but it was removing a default that was affecting all invocations of the exec plugin which didn't have their own local overrides. Specifically it was skipping execution, and that seemed to have an undesired side-effect on sub-modules as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. The documentation and the example code really helps.
[Updates 10/15]:
With some first-user input from Mark and some other cleanups and improved docs, the core code is tighter, safer, and cleaner. There were several files added in test, specifically to make it easy to understand the usage patterns for implementing tracked tasks.
I believe that we should have living examples where we can, and that means they should be tested. If necessary, we can put a group tag on the "example" tests to bypass them conveniently except for release-level tests. For now, they are enabled for review here.
[Previously ...]
The status tracking module may have use in both prod and test code, but we wanted to isolate and manage it better as a separate module. For modules like this, I propose we use a jvector-apis module. This is where you would put new modular functionality, like the status tracking API, which may be used by multiple modules. There is a good primer on this in the README in the module root.
For the status tracking API, this is a new facility to allow us to collect and share the status of internal jobs that are being run by jvector. The first use of it will be to ease testing and baseline work around performance and accuracy for different vector spaces and indexing configurations. However, it does have some hooks which may be lightweight enough to instrument prod code with, and this is a separate concern not addressed specifically in this PR. For now, this merely introduces the status tracking API, which will be pulled into the dataset streaming work once when ready.
There is a substantial amount of testing included in this PR. The demo scaffolding is there as a test layer for improving the API and making sure it is ergonomic and non-invasive enough to be added to extant code. If necessary, we can gate the unit tests of this module with an optional test group, but I'd like to see how it works as is first.
There are a couple of unrelated cleanups in this PR as well, from previous commits by other committers, around mvn exec configs, logger and test dependency inclusions which were not intended. I've personally contacted these committers and verified their intent before removing the extra deps and configurations.