Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial changes #1

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
135 changes: 135 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don’t work, or not
# install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Azure Functions artifacts
bin
obj
appsettings.json
local.settings.json

# Azurite artifacts
__blobstorage__
__queuestorage__
__azurite_db*__.json
.python_packages
45 changes: 44 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,45 @@
# DukeDataDeliveryPipeline
Services that deliver data stored in Azure between users
Services that deliver data stored in Azure between users.

```mermaid
sequenceDiagram
participant DDD
participant LogicApp
participant DataFactory
participant FunctionApp
participant AzureBlobStorage

DDD->>LogicApp: POST source and destinations to API
LogicApp->>DataFactory: Run DataDelivery pipeline
DataFactory->>FunctionApp: Fetch list of files being delivered
DataFactory->>AzureBlobStorage: Copy files to destination container
DataFactory->>DDD: POST manifest of files delivered to DDD
```
## Detailed Flow

- [DDD - Duke Data Delivery Website](https://github.com/Duke-GCB/D4S2)
- Sends POST request containing
- Azure Blob Storage source and destination container paths
- Delivery UUID - unique identifier for each delivery request
- Delivery ID - id of the delivery being performed
- Receives a POST at the end of the process
- on success the POST contains a manifest of the files delivered
- on failure the POST contains an error message
- [Logic App](logic-app.json)
- Receives a POST request
- Reads key vault for URL and authentication for webhook
- Runs DataFactory passing request body and webhook config
- [Data Factory](data-factory.json)
- Calls FunctionApp to create a manifest of files being delivered
- Uses `Copy data` Activity to copy data to the destination
- Notifies the external webhook on failure or success of the pipeline
- [Function App](function-app)
- Reads source files returning a manifest including file paths and their checksums

## Azure Blob Storage Permissions
The following storage permissions are required:
- Data Factory
- Write Permissions on the sink container
- Read Permissions on the source container
- Function App
- Read Permissions on the source container
Loading