Skip to content

Comparison of flow-based programming and service-oriented architecture for building data-driven applications

License

Notifications You must be signed in to change notification settings

mlatcl/fbp-vs-soa

Repository files navigation

FBP vs SOA for ML

This repo contains code for comparison flow-based programming (FBP) and software services (SOA) in the context of machine learning deployment. It accompanies the paper "An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context" presented at CAIN 2022 conference. ArXiv version of the paper is here: https://arxiv.org/abs/2204.12781. Published versions are openly available in ACM and IEEE libraries.

Earlier version of this repository accompanied paper "Towards better data discovery and collection with flow-based programming" presented at Data-centric AI workshop at NeurIPS 2021. The paper is also available on ArXiv and its code now lives in a separate branch.

Structure

The repository contains four applications: Ride Allocation, MBlogger, Insurance Claims and Playlist Builder. Each app is implemented six times, these implementations represent the same software at different stages of development using one of the aforementioned paradigms. Here is the complete list:

  • fbp_app_min - basic functionality implemented with FBP

  • fbp_app_data - same as above, plus data collection

  • fbp_app_ml - same as above, plus deployment of a trained ML model

  • soa_app_min - basic functionality implemented with SOA

  • soa_app_data - same as above, plus data collection

  • soa_app_ml - same as above, plus deployment of a trained ML model

Other source files are used to define common data types and generate some input data.

How to run the code

Before running any apps, make sure you have all dependencies, by running

pip install -r requirements.txt

Running FBP apps

To run FBP apps, use this command from root of the project:

python -m <app_name>.main fbp_app_<stage>

For example, to run insurance_claims at stage data, use

python -m insurance_claims.main fbp_app_data

Running SOA apps

SOA apps are implemented as Flask web services, and thus requires flask process to run to serve requests. Each SOA app has a flaskr/README.md file that explains how to do that.

How to collect metrics

Bash script code_metrics.sh can collect code metrics for all the apps. We collect several Halstead metrics, cyclomatic and cognitive complexities, maintainability index, as well as number of words and logical lines of code.

To output metrics data in the console:

./code_metrics.sh

To write metrics data to a CSV file:

./code_metrics.sh file_name_goes_here.csv

We also define and collect a metric "Number of Affected Components", which we understand as the number of components that were created or changed between any two moments in development. We collect it by generating a diff between two stages with:

git diff --no-index <app_name>/<stage1> <app_name>/<stage2>

for example

git diff --no-index mblogger/fbp_app_data mblogger/fbp_app_ml

All diffs are collected in "<app_name>/diffs" folder, and overall metric is available in component_diff.yml in the root of the project.

About

Comparison of flow-based programming and service-oriented architecture for building data-driven applications

Topics

Resources

License

Stars

Watchers

Forks