canihasashowplz

The application and model-training code to create canihasashowplz. This a television show recommendation system leveraging machine learning to provide users with recommendations based on their input show preferences. I developed it as a learning exercise to explore mlops, data science, and to solve the recurring dilemma whenever my spouse and I finish whatever we're watching together. Previous Jupyter notebook exploration and web scraping code can be found in this repo.

Project architecture

Model serving and service-APIs

flowchart

    Client(Client SPA)

    APIGW(Api Gateway)

    PredictionAck(Prediction Ack fn)
    PredictionWorker(Prediction Worker fn)
    PredictionPoller(Prediction Polling fn)
    
    ModelServingEndpoint(Model Serving Endpoint)
    ModelInference(Model Inference image) 
    Model(Model)
    
    ShowTable(Show Table)
    RatingTable(Rating Table)
    PredictionTable(Prediction Table)
    
    PredictionRequestQueue(Prediction Request Queue)
    

    Client --> APIGW
    APIGW --> Client

    APIGW -->|Request| PredictionAck
    PredictionAck -->|PredictionId| APIGW
    PredictionAck -->|Get show ids| ShowTable
    
    PredictionAck -->|Request| PredictionRequestQueue
    
    PredictionRequestQueue -->|Request| PredictionWorker
    
    PredictionWorker -->|Writes rating| RatingTable
    PredictionWorker -->|Gets prediction| ModelServingEndpoint
    PredictionWorker -->|Writes prediction| PredictionTable
    
    ModelServingEndpoint --> ModelInference
    ModelInference --> Model
    
    PredictionPoller -->|Polls for result| PredictionTable
    
    
    APIGW -->|Polls| PredictionPoller
    PredictionPoller -->|Responds with prediction| APIGW

Clients make a request, are given a prediction id in response, and then the front end polls until the prediction is processed. Since predictions can take >30 seconds, we need to process the request in the background and respond when we have the information from the model.

Model training

flowchart

    Trigger(Trigger)
    RatingTable(Rating Table)
    RatingsBucket(Ratings bucket)
    ModelBucket(Model bucket)
    RatingsExporter(Ratings Export fn)
    ModelTrainer(Model Trainer fn)
    ModelTraining(Model Training image)
    RatingTable(Rating Table)
    Model(Model)
       
    subgraph RatingsOutputCsv
        RatingsExporter -->|Reads| RatingTable
    end
    
    subgraph RatingsBucket
    end
    
    subgraph ModelTrainingWorkflow
        ModelTrainer -->|Invokes| ModelTraining
        ModelTraining -->|Stores output model| ModelBucket
        ModelBucket --> Model
    end
 
    Trigger -->|Invokes| RatingsOutputCsv
    RatingsOutputCsv -->|Stores file| RatingsBucket
    RatingsBucket -->|Reads file| ModelTrainingWorkflow

A trigger (currently manually, but eventually a recurring job or some threshold of new ratings being reached) kicks off a data export which is then consumed by our model trainer and outputs a new model using the updated data.

Deployment concerns

A single template.yaml contains all the Cloudformation definitions for service and its dependencies. However, at the moment, deployment is a two-step process (which I will eventually solve with a custom resource or similar).

First, you must deploy the model training architecture, invoke the ratings export function, and then invoke the model training function. These are orchestrated as a set of step functions. Once the job is complete, then you can deploy the model serving architecture. This is due to the SageMaker::Model having a dependency on an artifact in an S3 bucket (i.e., the model output). Further, I have my own data set scraped from the internet which populates the DynamoDB tables for shows and ratings. A blank slate of data won't be especially fun for recommendations.

This is only a concern for the first-time setup of this project. Once that is complete, further changes can be made through make build-staging and make deploy-staging or via CI/CD and pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
.idea		.idea
client		client
model_inference		model_inference
model_trainer		model_trainer
model_training		model_training
prediction_ack		prediction_ack
prediction_getter		prediction_getter
prediction_worker		prediction_worker
ratings_exporter		ratings_exporter
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
conftest.py		conftest.py
model_serving_samconfig.toml		model_serving_samconfig.toml
model_serving_template.yaml		model_serving_template.yaml
model_training_samconfig.toml		model_training_samconfig.toml
model_training_template.yaml		model_training_template.yaml
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

canihasashowplz

Project architecture

Model serving and service-APIs

Model training

Deployment concerns

About

Releases

Packages

Languages

laaksomavrick/canihasashowplz

Folders and files

Latest commit

History

Repository files navigation

canihasashowplz

Project architecture

Model serving and service-APIs

Model training

Deployment concerns

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages