Skip to content

bballamudi/multi-data-lineage-capture-py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Lineage Data System

IBM Research Multi-Lineage Data System is a provenance data management system capable of capturing, integrating, and querying provenance data generated across multiple, distributed services, programs, databases, and computational workflows.

For more information on the project, including participants and publications, please see ibm.biz/provlake.

This repository contains the Python library that captures provenance data in Python applications and send to the Multi-Data Lineage Manager, which is responsible for integrating the data in a provenance database stored as a knowledge graph (semantic detabase), then allowing users to run queries over the data.

It supports Python>=3.6

Very simple utilization example

from provlake.prov_lake import ProvLake
from provlake.prov_task import ProvTask

"""
Very simple example to show how this library is used to instrument a simple python script for provenance data management.
"""


def calc_factorial(n):
    num = n
    result = 1
    while num > 1:
        result = result * num
        num = num - 1
    return result


prov = ProvLake(online=False, should_log_to_file=True)

in_args = {"n": 5}
with ProvTask(prov, "factorial_number", in_args) as prov_task:

    factorial = calc_factorial(in_args.get("n"))

    out_args = {"factorial": factorial}
    prov_task.output(out_args)

prov.close()