Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SharePoint] Extend Metadata Indexing to custom one #644

Open
danajuratoni opened this issue Mar 28, 2023 · 12 comments
Open

[SharePoint] Extend Metadata Indexing to custom one #644

danajuratoni opened this issue Mar 28, 2023 · 12 comments

Comments

@danajuratoni
Copy link
Contributor

danajuratoni commented Mar 28, 2023

Issue created in another repo, replicated here for visibility: elastic/connectors-ruby#499

Description

In #1268, we removed many metadata fields from what we index for Sharepoint Online. We did this by adding explicit $select clauses to our queries to the Graph API, which tells the API which fields to send in the response.

However, we anticipate that many customers will want fine-grained control over what fields they fetch and index, but will also not want to make code changes in order to fetch more/different fields.

@danajuratoni danajuratoni added the enhancement New feature or request label Mar 28, 2023
@danajuratoni
Copy link
Contributor Author

cc: @seanstory we could repurpose this document for indexing custom metadata in a configurable manner

@seanstory
Copy link
Member

CC @JoseLuisGJ - we'll want some UX insight on a good way to make metadata field selection configurable.

Easiest way (I think) would be new RCFs with comma-separated field names that we allow-list (default values are the fields we chose in 8.9). But that will look gross fast - each document "type" (list, listItem, sitePage, driveItem, site, listItemAttachment, etc) has a different list of fields. So that's a lot of RCFs.

Another approach could be to just solve this with Advanced Sync Rules - letting the customer specify the exact $select clauses they want for each resource type. But that might be hard to maintain if we decide that certain fields are required, and they are not specified in such a sync rule.

@danajuratoni
Copy link
Contributor Author

@daveyholler this is highly awaited enhancement, could we get some design input on our best options here before starting implementation?

@daveyholler
Copy link

@danajuratoni happy to help. Can I get a demo, @seanstory on what this looks like in practice? I'm struggling to visualize this in my head.

@seanstory
Copy link
Member

Sure thing. Dropped time on the cal for tomorrow.

@danajuratoni danajuratoni added this to the 2023-08-01 - 2023-08-14 milestone Jul 31, 2023
@danajuratoni
Copy link
Contributor Author

danajuratoni commented Jul 31, 2023

@daveyholler is there any design deliverable planned? @seanstory what are the takeaways from the meeting?

@daveyholler
Copy link

@danajuratoni I've got some updates/clarification questions that I'll write up tomorrow morning.

@daveyholler
Copy link

@danajuratoni


After chatting with Sean, I think that there’s some things here we should dig into a little:

  1. How does a user know which fields are available for them to enter? — It sounds like there’s quite a bit of variation in which fields are present in each connector type. Do users have the ability to see all their field names on a given SharePoint (or other) connector? Is there a way we can sample documents to provide them with a finite list of field options? Or is getting this information more of a back and forth between individuals (roles) within the user’s organization?

  2. What kind of validation are we able to offer after a user specifies field names?

  3. If the fieldnames are “arbitrary” (as far as we’re concerned), and a user can’t validate what they’ve entered, and/or if the process of actually providing those field names is more challenging than selecting from a drop down list, how many users do you anticipate will use the feature in the UI?

  4. And lastly, is this something that we can/should provide via API rather than adding steps/options to the UI?

@DianaJourdan
Copy link
Collaborator

@danajuratoni as there is no design yet and discussions are still needed, this one can't make 8.10 anymore

@DianaJourdan DianaJourdan removed this from the 2023-08-01 - 2023-08-14 milestone Aug 16, 2023
@danajuratoni
Copy link
Contributor Author

danajuratoni commented Aug 21, 2023

I have the feeling we're mixing Rich Configurable Fields(RCF) with I'll call Advanced Filtering Fields.

How does a user know which fields are available for them to enter

RCF are fields each connector sends to Kibana and show up as editable in the Configuration tab. We aim to make these as "rich" and user friendly as possible with placeholders / validations / dropdowns / selection options where possible. These would likely reside in the config.yaml some day.
All connectors require at least one RFC to connect to the data source. Additional fields might be added for other functionality such as extraction capabilities or reducing the ingest scope to e.g. a certain table / project / space. These fields for reducing the data corpus to be ingested have a certain overlap with Advanced Filtering Fields. However, Advanced Filtering Fields are fields that

  • are not available for all connector clients
  • must not be a required field for getting a connector up and running
  • can be as simple as a skipExtractingDriveItemsOlderThan field or contain complex queries
  • the complexity hidden behind the queries (SQL, JQL, etc.) or the fields that are used in APIs means we're relying on the dev to make sure the entered content is valid, and the input is covered by minimal validations on our side. We pass the input to the data source which will interpret it
  • that "might" be ignored by incremental syncs in some cases
  • should have minimal UI (a JSON atm) and should easy to send the content to an API but should NOT be specified in a config.yaml file (imagine a SQL query)

Custom metadata fields in particular, I'd categorize in the same logical category as specifying which tables or table rows should be ingested. Could be an optional RCF or an Advanced Filtering Field.

-- I missed posting this comment before leaving on PTO and I'm still catching up, please share if I missed any updates in the meantime

@danajuratoni
Copy link
Contributor Author

@daveyholler @seanstory Let's schedule a meeting if more discussions are needed. I'd like to get clarity on the designs for this feature asap, so that we can resume implementation. Even if this feature will be available only for connector clients until 8.11 is released, it is critical to unblock customer deals.

@seanstory
Copy link
Member

After a sync with Dana and Davey, we came to the conclusion that we should allow SPO custom metadata fields to be configured via our Advanced Sync Rules. Requirements:

  • make the $select clauses as granular as possible. We know that the Sharepoint Rest API is weird with Site Pages for example, and that $select=foo,bar,baz might work on one Site, but not on another.
  • whatever the user provides for a $select will be unioned with what we need for the connector to work. This means this feature will be additive only, and can't be used to trim down the number of indexed fields.
  • log warnings when queries fail because of the $select clause
    • if there was a failure and we used Advanced Sync Rule provided selects, try again using only our default $select clause, without their customizations
      • if that still fails, log another warning, but do not crash
  • document the nuances - Advanced Rules for config, unioning (not explicit setting), no fail-fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants