-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft of search API. #2868
base: master
Are you sure you want to change the base?
Conversation
src/sandstorm/indexer.capnp
Outdated
# Indexer implementation | ||
|
||
interface IndexerSession { | ||
# A session type specifically implemented by the indexer app. The app's UiView's newSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many indexer app grains are there? Just one? One per user? Maybe the are created on demand when the user manually sets up some grains as indexable?
interface GrainIndexer(Metadata) { | ||
# Capability used to index the content of a grain. | ||
# | ||
# This is a one-way capability. Although GrainIndexer is implemented by the indexer and is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only vaguely understand what you're intending to protect against by making this a "one-way" capability. Should we be worried that capabilities could be smuggled through the metadata
field of IndexableContent
?
# Note that, as an optimization, Sandstorm may start out assuming that all users see exactly | ||
# the same content, and may do all indexing as an anonymous user, perhaps with no permissions. | ||
# However, if the app logs an ActivityEvent that specifies that it requires specific permissions | ||
# or is visible only to certain users, Sandstorm uses that as a hint to index the path associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that Sandstorm tries to index any grain that calls SessionContext.activity()
?
src/sandstorm/indexer.capnp
Outdated
using Grain = import "grain.capnp"; | ||
|
||
interface IndexingSession(Metadata) { | ||
# This is a UiView session type, created by calling UiView.newSession(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess this should have extends(UiSession)
?
I added comments answering all of @dwrensha's questions. |
How well do you expect this to scale? Global full-text search of say... documents... seems reasonable to expect Sandstorm servers to do, but what happens if I have an eBook library with a few hundred books of text in my Sandstorm? Will it be possible to define what type of content is being indexed? To extend the above example, if I have an eBook library, and a lot of personal documents, it may be hard to find personal documents (data I generated) I am looking for if it also returns a lot of results from books (data I have). |
@ocdtrekkie I would guess that the tech can handle indexing your ebooks. Lucene is designed to be able to index millions or even billions of documents, and I assume Lucy is similar. As a rule of thumb, the index ought to be similar in size to the text it is indexing. As for usability issues, that's a good point. I think we would handle this by adding search operators and such. This initial design doesn't really get into how those will work, but it can be extended in that direction in the future. |
# metadata format. | ||
# | ||
# TODO(someday): Spec out how this works. Not needed for MVP of search. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me how an indexer would be able to use "generic" operators like this while treating the metadata as completely opaque.
If the plan is to not implement this right away anyway, is there a good reason to even include this method? We can always add more later...
DO NOT MERGE
There are some
TODO(now)
s in there regarding parts I'm still thinking about, but wanted to get some initial feedback on this.