Skip to content

Conversation

@askumar27
Copy link
Contributor

Use Looker APIs to get fields for a Views

Current Approach

  • We use lkml parser to get View attributes, and this has limitation of when Views are very complex

Proposed Approach

  • Use Looker APIs to get fields of a View, this API need to be called via the Explore name.
  • As many Views exist within an Explore - number of API calls <= number of Views with local LRU cache
  • The code tries to groups as many Views to the same Explore as possible, so as to minimize the number of API calls. As a single View can exist in many Explore
  • Add reliability towards lineage extraction in the next step where SQL query is prepared based on the fields

Other changes

  • More debug logging
  • Automatic fallback to view_context based fields extraction with better error handling
  • Best effort fields extraction - continue on failure (atleast one field is required) instead of falling back to different upstream strategy

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 21, 2025
@askumar27 askumar27 changed the title Feature/acr 6601/get fields from api feat(lookml): Use Looker API to get fields of a View Oct 21, 2025
@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 72.34043% with 26 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...c/datahub/ingestion/source/looker/view_upstream.py 61.76% 26 Missing ⚠️

📢 Thoughts on this report? Let us know!

@sgomezvillamor
Copy link
Contributor

  • We use lkml parser to get View attributes, and this has limitation of when Views are very complex
  • Use Looker APIs to get fields of a View, this API need to be called via the Explore name.

Are we adding a fallback strategy? or do we replace with new strategy?
If the second, that could lead to some breaking change. If so, how can we address or mitigate it?

@sgomezvillamor
Copy link
Contributor

Looker/LookML ingestion is becoming a very complex code.
That's noted in the amount of debug logging required.

Given the amount of detail we require for troubleshooting, I wonder if there is python tooling that we can use to trace execution and no require explicit logger debug lines. 🤔

Not a blocker, just thinking loudly.

@askumar27
Copy link
Contributor Author

  • We use lkml parser to get View attributes, and this has limitation of when Views are very complex
  • Use Looker APIs to get fields of a View, this API need to be called via the Explore name.

Are we adding a fallback strategy? or do we replace with new strategy? If the second, that could lead to some breaking change. If so, how can we address or mitigate it?

There is a fallback strategy in place if there are any issues with Looker APIs to get the fields, it falls back to view_context based solution (current). There should not be any backward incompatibility issues here with either of the strategy. Worst case scenarios is no CLL or partial CLL (same as today)

  • Automatic fallback to view_context based fields extraction with better error handling
  • Best effort fields extraction - continue on failure (atleast one field is required) instead of falling back to different upstream strategy

@askumar27
Copy link
Contributor Author

Looker/LookML ingestion is becoming a very complex code. That's noted in the amount of debug logging required.

Given the amount of detail we require for troubleshooting, I wonder if there is python tooling that we can use to trace execution and no require explicit logger debug lines. 🤔

Not a blocker, just thinking loudly.

This is a great suggestion for tracing, we should certainly explore options.
Regarding the debug logs, Sorry I forgot to mention in the PR - this is also a dev build to be used for a customer debug. I will then trim out the debug logs before merging.

…l and caching for explores

- Updated `lookml_model_explore` method to accept an optional `fields` parameter for optimized API calls.
- Introduced `get_explore_fields_from_looker_api` method to fetch fields directly from the Looker API, improving performance and reducing unnecessary API calls.
- Implemented fallback mechanisms to retrieve fields from view context if API calls fail.
- Added logging for better traceability of API interactions and field retrieval processes.
- Implemented a greedy algorithm to minimize API calls by grouping views with common explores, improving overall efficiency.
…ing various scenarios including edge cases and performance with large datasets.
…ew upstream processing

- Updated logic to ensure only one field per dimension group is added, improving clarity and maintainability of the code.
- Removed redundant checks and added comments for better understanding of the dimension group handling process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants