forked from datahub-project/datahub
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Kerem Sahin
committed
Sep 1, 2019
1 parent
cfcbb56
commit 23339df
Showing
3,828 changed files
with
131,600 additions
and
124,321 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,23 @@ | ||
build/ | ||
target/ | ||
repos/ | ||
tmp/ | ||
bin/ | ||
.gradle/ | ||
.settings | ||
# Gradle & Avro | ||
.project | ||
.settings | ||
.classpath | ||
*.swp | ||
*.jar | ||
*.idea | ||
.gradle | ||
.idea | ||
*.iml | ||
*.class | ||
*.ipr | ||
*.iws | ||
/RUNNING_PID | ||
wherehows-etl/src/main/resources/application.properties | ||
**/test/resources/*.properties | ||
logs/ | ||
.DS_Store | ||
# See https://help.github.com/ignore-files/ for more about ignoring files. | ||
*.ipr | ||
**/mxe | ||
|
||
# Pegasus & Avro | ||
**/src/mainGenerated* | ||
**/src/testGenerated* | ||
|
||
# Added by mp-maker | ||
**/build | ||
/config | ||
*/i18n | ||
/out | ||
|
||
# compiled output | ||
dist/ | ||
out/ | ||
/commit | ||
/.vscode/ | ||
*/src/generated/ | ||
# Mac OS | ||
**/.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,8 @@ | ||
dist: trusty | ||
|
||
sudo: required | ||
|
||
language: java | ||
|
||
jdk: | ||
- oraclejdk8 | ||
|
||
env: | ||
- DOCKER_COMPOSE_VERSION=1.22.0 | ||
|
||
services: | ||
- docker | ||
- elasticsearch | ||
|
||
before_install: | ||
# elasticsearch | ||
- curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.3.5/elasticsearch-2.3.5.deb && sudo dpkg -i --force-confnew elasticsearch-2.3.5.deb && sudo service elasticsearch restart | ||
|
||
# extralibs | ||
- wget https://github.com/ericsun2/sandbox/raw/master/extralibs/extralibs.zip | ||
- mkdir -p wherehows-etl/extralibs; unzip extralibs.zip -d wherehows-etl/extralibs | ||
|
||
# docker-compose | ||
- sudo rm /usr/local/bin/docker-compose | ||
- curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` > docker-compose | ||
- chmod +x docker-compose | ||
- sudo mv docker-compose /usr/local/bin | ||
|
||
# permanently increase upper limit for possible watches created per uid, build step exceeds default on travis | ||
- echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p | ||
|
||
|
||
cache: | ||
directories: | ||
- $HOME/.gradle/caches/ | ||
- $HOME/.gradle/wrapper/ | ||
|
||
script: | ||
- ./gradlew check assemble | ||
- ./gradlew jacocoFullReport coveralls && ./gradlew emberCoverage | ||
- (cd wherehows-docker && ./build.sh latest) | ||
- (cd wherehows-docker && docker-compose config) | ||
|
||
after_script: | ||
- rm -rf $WHEREHOWS_DIR/coverage | ||
- ./gradlew check assemble | ||
- ./gradlew emberCoverage |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,87 +1,168 @@ | ||
# WhereHows [![Build Status](https://travis-ci.org/linkedin/WhereHows.svg?branch=master)](https://travis-ci.org/linkedin/WhereHows) [![latest](https://img.shields.io/badge/latest-1.0.0-blue.svg)](https://github.com/linkedin/WhereHows/releases) [![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/wherehows) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/LinkedIn/Wherehows/wiki/Contributing) | ||
## Pre-requisites | ||
Be sure to have JDK installed on your machine. | ||
|
||
``` | ||
sudo yum install java-1.8.0-openjdk-devel | ||
``` | ||
|
||
WhereHows is a data discovery and lineage tool built at LinkedIn. It integrates with all the major data processing systems and collects both catalog and operational metadata from them. | ||
Install docker and docker-compose. | ||
``` | ||
Check https://www.docker.com/get-started for instructions on how to install docker-ce | ||
``` | ||
|
||
Within the central metadata repository, WhereHows curates, associates, and surfaces the metadata information through two interfaces: | ||
* a web application that enables data & linage discovery, and community collaboration | ||
* an API endpoint that empowers automation of data processes/applications | ||
Install Chrome web browser. | ||
``` | ||
https://www.google.com/chrome/ | ||
``` | ||
|
||
WhereHows serves as the single platform that: | ||
* links data objects with people and processes | ||
* enables crowdsourcing for data knowledge | ||
* provides data governance and provenance based on ownership and lineage | ||
## Quickstart | ||
To start all Docker images at once, please follow below instructions. | ||
|
||
``` | ||
cd docker/quickstart | ||
docker-compose up | ||
cd ../elasticsearch && bash init.sh | ||
``` | ||
|
||
## Who Uses WhereHows? | ||
Here is a list of companies known to use WhereHows. Let us know if we missed your company! | ||
## Starting Kafka | ||
Kafka, ZooKeeper and Schema Registry are running in individual Docker containers. | ||
We are using Confluent images. Default configurations are used. | ||
|
||
* [LinkedIn](http://www.linkedin.com) | ||
* [Overstock.com](http://www.overstock.com) | ||
* [Fitbit](http://www.fitbit.com) | ||
* [Carbonite](https://www.carbonite.com) | ||
``` | ||
cd docker/kafka | ||
docker-compose up | ||
``` | ||
|
||
## Starting MySQL | ||
MySQL Server runs in its own Docker container. Please run below commands to start MySQL container. | ||
``` | ||
cd docker/mysql | ||
docker-compose up | ||
``` | ||
To connect to MySQL server you can use below command: | ||
``` | ||
docker exec -it mysql mysql -u datahub -pdatahub datahub | ||
``` | ||
|
||
## How Is WhereHows Used? | ||
How WhereHows is used inside of LinkedIn and other potential [use cases][USE]. | ||
## Starting ElasticSearch and Kibana | ||
ElasticSearch and Kibana run in their own Docker containers. Please run below commands to start ElasticSearch and Kibana containers. | ||
``` | ||
cd docker/elasticsearch | ||
docker-compose up | ||
``` | ||
After containers are initialized, we need to create the search index by running below command: | ||
``` | ||
bash init.sh | ||
``` | ||
You can connect to Kibana on your web browser via below link | ||
``` | ||
http://localhost:5601 | ||
``` | ||
|
||
## Starting GMS | ||
|
||
## Documentation | ||
The detailed information can be found in the [Wiki][wiki] | ||
``` | ||
./gradlew build | ||
./gradlew :gms:war:JettyRunWar | ||
``` | ||
|
||
### Example GMS Curl Calls | ||
|
||
## Examples in VM (Deprecated) | ||
There is a pre-built vmware image (about 11GB) to quickly demonstrate the functionality of WhereHows. Check out the [VM Guide][VM] | ||
#### Create | ||
``` | ||
curl 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects": [{"com.linkedin.identity.CorpUserInfo":{"active": true, "fullName": "Foo Bar", "email": "[email protected]"}}, {"com.linkedin.identity.CorpUserEditableInfo":{}}], "urn": "urn:li:corpuser:fbar"}' -v | ||
curl 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects":[{"com.linkedin.common.Ownership":{"owners":[{"owner":"urn:li:corpuser:ksahin","type":"DATAOWNER"}],"lastModified":{"time":0,"actor":"urn:li:corpuser:ksahin"}}},{"com.linkedin.dataset.UpstreamLineage":{"upstreams":[{"auditStamp":{"time":0,"actor":"urn:li:corpuser:ksahin"},"dataset":"urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD)","type":"TRANSFORMED"}]}},{"com.linkedin.common.InstitutionalMemory":{"elements":[{"url":"https://www.linkedin.com","description":"Sample doc","createStamp":{"time":0,"actor":"urn:li:corpuser:ksahin"}}]}},{"com.linkedin.schema.SchemaMetadata":{"schemaName":"FooEvent","platform":"urn:li:dataPlatform:foo","version":0,"created":{"time":0,"actor":"urn:li:corpuser:ksahin"},"lastModified":{"time":0,"actor":"urn:li:corpuser:ksahin"},"hash":"","platformSchema":{"com.linkedin.schema.KafkaSchema":{"documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"}},"fields":[{"fieldPath":"foo","description":"Bar","nativeDataType":"string","type":{"type":{"com.linkedin.schema.StringType":{}}}}]}}],"urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"}' -v | ||
``` | ||
|
||
## WhereHows Docker | ||
Docker can provide configuration free dev/production setup quickly, please check out [Docker Getting Start Guide](https://github.com/linkedin/WhereHows/tree/master/wherehows-docker/README.md) | ||
#### Get | ||
``` | ||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.identity.CorpUserInfo,version:0)))' | jq | ||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.common.Ownership,version:0)))' | jq | ||
``` | ||
|
||
## Getting Started | ||
New to Wherehows? Check out the [Getting Started Guide][GS] | ||
### Get all | ||
``` | ||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get_all' 'http://localhost:8080/corpUsers' | jq | ||
``` | ||
|
||
### Browse | ||
|
||
### Preparation | ||
First, please [setup the metadata repository][DB] in MySQL. | ||
``` | ||
CREATE DATABASE wherehows | ||
DEFAULT CHARACTER SET utf8 | ||
DEFAULT COLLATE utf8_general_ci; | ||
curl "http://localhost:8080/datasets?action=browse" -d '{"path": "", "start": 0, "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq | ||
``` | ||
|
||
### Search | ||
|
||
CREATE USER 'wherehows'; | ||
SET PASSWORD FOR 'wherehows' = PASSWORD('wherehows'); | ||
GRANT ALL ON wherehows.* TO 'wherehows' | ||
``` | ||
curl "http://localhost:8080/corpUsers?q=search&input=foo&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq | ||
curl "http://localhost:8080/datasets?q=search&input=foo&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq | ||
``` | ||
|
||
Execute the [DDL files][DDL] to create the required repository tables in **wherehows** database. | ||
### Autocomplete | ||
|
||
### Build | ||
1. Get the source code: ```git clone https://github.com/linkedin/WhereHows.git``` | ||
2. Put a few 3rd-party jar files to **wherehows-etl/extralibs** directory. Some of these jar files may not be available in Maven Central or Artifactory. See [the download instrucitons][EXJAR] for more detail. ```cd WhereHows/wherehows-etl/extralibs``` | ||
3. From the **WhereHows** root directory and build all the modules: ```./gradlew build``` | ||
4. Start the metadata ETL and API service: ```./gradlew wherehows-backend:runPlayBinary``` | ||
5. In a new terminal, start the web front-end: ```./gradlew wherehows-frontend:runPlayBinary```. The WhereHows UI is available at http://localhost:9001 by default. You can change the port number by editing the value of ```project.ext.httpPort``` in ```wherehows-frontend/build.gradle```. | ||
``` | ||
curl "http://localhost:8080/datasets?action=autocomplete" -d '{"query": "foo", "field": "name", "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq | ||
``` | ||
|
||
### Ownership | ||
|
||
## Roadmap | ||
Check out the current [roadmap][RM] for WhereHows. | ||
``` | ||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/rawOwnership/0' | jq | ||
``` | ||
|
||
### Schema | ||
|
||
## Contribute | ||
Want to contribute? Check out the [Contributors Guide][CON] | ||
``` | ||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/schema/0' | jq | ||
``` | ||
|
||
## Debugging Kafka | ||
GMS fires a MetadataAuditEvent after a new record is created through snapshot endpoint. We can check if this message is correctly fired using kafkacat. | ||
``` | ||
Install kafkacat through this link https://github.com/edenhill/kafkacat | ||
``` | ||
To consume messages on MetadataAuditEvent topic, run below command. It doesn't support Avro deserialization just yet, but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that. | ||
``` | ||
kafkacat -b localhost:9092 -t MetadataAuditEvent | ||
``` | ||
|
||
## Community | ||
Want help? Check out the [Gitter chat room][GITTER] and [Google Groups][LIST] | ||
## Starting Elasticsearch Indexing Job | ||
Run below to start Elasticsearch indexing job. | ||
``` | ||
./gradlew :metadata-jobs:elasticsearch-index-job:run | ||
``` | ||
To test the job, you should've already started Kafka, GMS, MySQL and ElasticSearch/Kibana. | ||
After starting all the services, you can create a record in GMS by Snapshot endpoint as below. | ||
``` | ||
curl 'http://localhost:8080/metrics/($params:(),name:a.b.c01,type:UMP)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects": [{"com.linkedin.common.Ownership":{"owners":[{"owner":"urn:li:corpuser:ksahin","type":"DATAOWNER"}]}}], "urn": "urn:li:metric:(UMP,a.b.c01)"}' -v | ||
``` | ||
This will fire an MAE and search index will be updated by indexing job after reading MAE from Kafka. | ||
Then, you can check ES index if document is populated by below command. | ||
``` | ||
curl localhost:9200/metricdocument/_search -d '{"query":{"match":{"urn":"urn:li:metric:(UMP,a.b.c01)"}}}' | jq | ||
``` | ||
|
||
## Starting MetadataChangeEvent Consuming Job | ||
Run below to start MCE consuming job. | ||
``` | ||
./gradlew :metadata-jobs:mce-consumer-job:run | ||
``` | ||
Create your own MCE to align the models in sample_MCE.dat. | ||
Tips: one liner per MCE with Python syntax. | ||
|
||
[wiki]: https://github.com/LinkedIn/Wherehows/wiki | ||
[GS]: https://github.com/linkedin/WhereHows/blob/master/wherehows-docs/getting-started.md | ||
[CON]: https://github.com/linkedin/WhereHows/blob/master/wherehows-docs/contributing.md | ||
[USE]: https://github.com/linkedin/WhereHows/blob/master/wherehows-docs/use-cases.md | ||
[RM]: https://github.com/linkedin/WhereHows/blob/master/wherehows-docs/roadmap.md | ||
[VM]: https://github.com/LinkedIn/Wherehows/wiki/Quick-Start-With-VM | ||
[EXJAR]: https://github.com/linkedin/WhereHows/tree/master/wherehows-etl/extralibs | ||
[DDL]: https://github.com/linkedin/WhereHows/tree/master/wherehows-data-model/DDL | ||
[DB]: https://github.com/linkedin/WhereHows/blob/master/wherehows-docs/getting-started.md#database-setup | ||
[LIST]: https://groups.google.com/forum/#!forum/wherehows | ||
[GITTER]: https://gitter.im/wherehows | ||
Then you can produce MCE to feed your GMS. | ||
``` | ||
cd metadata-ingestion/src | ||
python avro_cli.py produce | ||
``` | ||
|
||
## Starting Datahub Frontend | ||
Run below to start datahub-frontend Play server. | ||
``` | ||
cd datahub-frontend/run | ||
./run-local-frontend | ||
``` | ||
Then you can connect to Datahub on your web browser via below link | ||
``` | ||
http://localhost:9001 | ||
``` |
Oops, something went wrong.