Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrates Atlas DSL Search #17

Merged
merged 3 commits into from
May 14, 2019
Merged

Integrates Atlas DSL Search #17

merged 3 commits into from
May 14, 2019

Conversation

whazor
Copy link
Contributor

@whazor whazor commented May 2, 2019

Title

  • My PR Title addresses the issue accurately and concisely.
    • Example: "Updates the version of Flask to v1.0.2"
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR:
    Similarly as in the metadata service we want to provide an integration with Apache Atlas, but then for search.

In order to have a second integration inside of the search service, we add a base proxy in order to have more than one possible search engine.

Furthermore, Apache Atlas provides multiple methods of searching and we discuss them within docs/atlas-search.md. We choose to search with the DSL as it is straightforward offered from Atlas and still provides some flexibility in search.

Tests

  • My PR adds the following unit tests:
    tests/unit/proxy/atlas.py

Which tests both normal search and with fields. The unit test mocks the atlas client and adds support for checking the query send to Atlas.

Commits

  • I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality & Coverage

  • Passes make test

@feng-tao
Copy link
Member

feng-tao commented May 2, 2019

exciting :) let us know once it is ready to review :)
cc @verdan @jinhyukchang

@feng-tao
Copy link
Member

feng-tao commented May 2, 2019

oh CLA check seems to fail. I think you may want to configure your github username locally and do one more push.

@whazor whazor force-pushed the atlas branch 3 times, most recently from d81bc7d to 8895f1a Compare May 3, 2019 07:04
@whazor
Copy link
Contributor Author

whazor commented May 3, 2019

Rebased changes with the latest config change. Cc: @gschmutz environment variables will change. Does anyone want me to add backwards compatibility?

Copy link
Member

@verdan verdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @whazor 👍 👍
A few comments/questions for my understanding.

search_service/proxy/atlas.py Outdated Show resolved Hide resolved
search_service/proxy/atlas.py Outdated Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
search_service/proxy/atlas.py Outdated Show resolved Hide resolved
search_service/proxy/atlas.py Outdated Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
@whazor whazor force-pushed the atlas branch 3 times, most recently from 30200b8 to 68fcd84 Compare May 9, 2019 09:06
@codecov-io
Copy link

codecov-io commented May 9, 2019

Codecov Report

Merging #17 into master will increase coverage by 9.32%.
The diff coverage is 89.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #17      +/-   ##
==========================================
+ Coverage   67.12%   76.45%   +9.32%     
==========================================
  Files           8       10       +2     
  Lines         216      344     +128     
  Branches       19       37      +18     
==========================================
+ Hits          145      263     +118     
- Misses         62       67       +5     
- Partials        9       14       +5
Impacted Files Coverage Δ
search_service/proxy/elasticsearch.py 66.29% <100%> (+0.38%) ⬆️
search_service/config.py 100% <100%> (ø) ⬆️
search_service/api/search.py 35.13% <20%> (ø) ⬆️
search_service/proxy/base.py 77.77% <77.77%> (ø)
search_service/proxy/atlas.py 92.66% <92.66%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86ea652...12ea887. Read the comment docs.

@whazor
Copy link
Contributor Author

whazor commented May 9, 2019

I updated the branch in the same way as I did in metadatalibrary. Currently this pull request works together with that pull request.

@feng-tao
Copy link
Member

cc @verdan @jinhyukchang

@whazor whazor force-pushed the atlas branch 2 times, most recently from 130a932 to ccbd764 Compare May 10, 2019 09:08
@whazor
Copy link
Contributor Author

whazor commented May 10, 2019

@verdan @feng-tao @jinhyukchang updated tests and config, now supports both master from metadatalibrary and the pull request. By default the configuration is set to ATLAS_NAME_ATTRIBUTE = 'qualifiedName' which will make it work for the master. Should be ready to be merged.

Note that ATLAS_NAME_ATTRIBUTE on name in general does not work for master of metadatalibrary without my pull request.

Copy link
Member

@verdan verdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, but overall really nice work Nanne.


db_name = db_attrs.get(self.NAME_ATTRIBUTE)
db_qualified_name = db_attrs.get('qualifiedName', '').split('@')
db_cluster = db_qualified_name[1] if len(db_qualified_name) > 1 else ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database entity has an attribute clusterName, which can be used directly here.

key=f"{entity.typeName or 'Table'}://{db_cluster}.{db_name}/{table_name}",
description=attrs.get('description'),
cluster=db_cluster,
database=entity.typeName or 'Table',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would always be a typeName. So this check would never go to or

search_service/proxy/atlas.py Outdated Show resolved Hide resolved
search_service/proxy/atlas.py Show resolved Hide resolved
docs/atlas-search.md Outdated Show resolved Hide resolved
search_service/config.py Outdated Show resolved Hide resolved
@feng-tao
Copy link
Member

@whazor we should bump the pypi version as well. Let me know once the pr is updated, and we could merge it. Thanks.

* Add new proxy with the Apache Atlas as backend

* Add document to describing Apache Atlas search methods
@whazor
Copy link
Contributor Author

whazor commented May 14, 2019

@feng-tao Resolved all issues, thank you for bumping the pypi version. Sorry all for the inconvenience of renaming the elastic env variables.

@feng-tao
Copy link
Member

no worry, thanks @whazor . exciting to see Amundsen works with Atlas end-to-end :)

@feng-tao feng-tao merged commit 1a05eb2 into amundsen-io:master May 14, 2019
@feng-tao
Copy link
Member

Merge :) Thanks @whazor
cc @verdan @bolkedebruin

jhettler pushed a commit to jhettler/amundsensearchlibrary that referenced this pull request Jul 27, 2019
…_metrics_dashboards to test

* commit '6ffc57032ef83197b3d642ccad02b31a633c0d18':
  Add metrics/dashboards
  Fix #24, correct initialisation of elastic search (amundsen-io#27)
  [DPTOOLS-2252] Publish Docker image in CI (amundsen-io#26)
  Integrates Atlas DSL Search (amundsen-io#17)
  Update PULL_REQUEST_TEMPLATE.md (amundsen-io#23)
  Update README.md (amundsen-io#22)
  Add codecov based for search repo (amundsen-io#20)
  Update README.md (amundsen-io#19)
  Set the elasticsearch base (endpoint) from env variable  (amundsen-io#16)
  Adds the PR template for amundsen search service (amundsen-io#15)
  Doc fix: Docker pull the official image (amundsen-io#14)
  Changed the name of this file for consistency (amundsen-io#13)
  gitignore dist/ as in metadataservice PR #28 (amundsen-io#12)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants