Prediction of submission rates

This repository contains the development of an ML algorithm for predicting the submission rates and holds its API implementation. The API implementation provides the ability of using a pretrained neural network, that is developed in an offline mode, and training a new model with the provided dataset. The initial provided dataset is provided in the code with the files of the pretrained model. This repository comprises a basic component for the functionality of the following:

Predict submission rates with the pretrained model
Train the same format of the Neural Network and predict submission rates

Installation and Dependencies

This component is implemented in Python3. Its requirements are specified in the packages.txt in the root folder. In general, a new virtual environment would be beneficial in the installation. Also, it is crucial to advice that linux should be the OS for testing this environment. Regardless of the OS, Docker and Python 3 must be installed.

Installing from code in bare metal

In this option, a functional mongoDB is essential for the core functionality of the service. To have it up and running from code, please do the following:

$ git clone https://github.com/panstav1/regression_predict_subrate.git # Clone this repository
$ cd regression_predict_subrate # Go to the downloaded folder
$ python setup.py develop # Install dependencies
$ cd src/model_regr
$ python app.py run # server at http://localhost:4010

A server will be running on that session, on port 4010. You can access it by using curl, like in:

$ curl <host name>:4011/api

Docker-based

In this option, a functional mongoDB and docker installed are essential for the core functionality of the service. With the following code, a docker of the code will be built and run:

# build Docker container
sudo docker build .

# run Docker container
docker run --rm -d -p 4010:4010 --name regression_predict_subrate

Docker-compose-based (highly recommended)

In this option, a functional mongoDB is included in the docker-compose script. With a simple script, MongoDB and the service is installed:

# build Docker containers of MongoDB and tng-vnv-dsm
sudo docker-compose build

# run Docker containers
sudo docker-compose up

Developing/Contributing

To contribute to the development, you may use the very same development workflow as for any other Github project. That is, you have to fork the repository and create pull requests.

Dependencies

In this repository, the following libraries are used (also referenced in the packages.txt file) for development:

Numpy (v.1.18.5) - Scientific computing tools with Python
Pandas (v.1.0.5) - Open source data analysis and manipulation tool
Scikit-learn (v.0.23.1) - Simple and efficient tools for Machine Learning in Python
Flask (v.1.1.2) - A simple framework for building complex web applications.
Flask-restplus (v.0.13.0 ) - Fully featured framework for fast, easy and documented API development with Flask
Tensorflow (v.2.3.1) - Open source library to develop and train ML models
Keras (v.2.4.3) - High-level API of TensorFlow 2.0
Requests (v.2.5) - Python HTTP for Humans
H5py (v.2.10.0) - Pythonic interface to the HDF5 binary data format
Setuptools (v.50.3.2) - Easily download, build, install, upgrade, and uninstall Python packages
Cython (v.0.29.21) - The Cython compiler for writing C extensions for the Python language

Below, the libraries are used for the MongoDB functionalities:

Pymongo (v.3.11.1) - Python distribution containing tools for working with MongoDB

These libraries are installed/updated in the developer's machine when running the command (see above):

$ python setup.py install (develop)

Submitting/Requesting changes

Changes to the repository can be requested using this repository's issues and pull requests mechanisms.

Usage

The aim of regression_predict_subrate API is the seamless provision of predictions on the submission rates. More specifically, it is designed to host a pretrained Neural Network model and a non trained Neural Network model, with the same structure. The former is available to provide predictions from the very first time of the deployment of the project while the latter needs to be trained as a first step. The sequential structure of the ML models is defined in Keras below:

Dense layer of 50 Neurons followed by ReLu activation layer
Dense layer of 512 layers followed by ReLu activation layer
Dense layer of 256 layers followed by ReLu activation layer
Dense layer of 64 layers followed by ReLu activation layer
Dense layer of 1 layers followed by Linear activation layer

The structure of the Neural Network is serialized in the JSON format in this file. In the same folder, the initial files of the pretrained model and the dataset are located, all of which will be uploaded in the MongoDB with the deployment of the API.

Dataset

The dataset is located in the folder in .h5 format for seamless retrieval from the API. The API has the

Features	Meaning
surveyID	ID of the corresponding service
submissions	# of submissions
views	# of views
v0	Feature 0
...	...
v46	Feature 46
sub_rate	Submission rate, defined as submissions per views

The dataset is formatted as a DataFrame in order to provide its seamless retrieval from MongoDB directly to the API as soon as it is deployed.

Pretrained Neural Network Model

The pretrained model is trained in an offline mode under several experimentations with the dataset. The corresponding files that were produced from the offline mode are located in the binFiles folder, in which the preprocessing PowerTransformers, the model itself and its weights are located. With the deployment of the API, these are transferred in the MongoDB's GridFS with the same filenames and loaded in the API, ready for use:

Action	HTTP Method	Endpoint
Get prediction of a single submission rate	`GET`	`curl -H "Content-Type: application/json" -X GET --data '{"surveyID": 58485, "features": [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2]}' http://localhost:4010/api/pretrained/predict`
Get predictions of several submission rates	`GET`	`curl -H "Content-Type: application/json" -X POST --data '[{"surveyID": 58485, "features": [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2]},{"surveyID": 4234232, "features": [0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,2]}]' http://localhost:4010/api/pretrained/predict`

Note: It is recommended to use Postman for sending requests, except if you connect with another API.

The data transferred through the endpoint as a JSON object or as a list of JSON objects for the single and the multiple submission rates respectively. The validity of the names and the length of the features is performed in the API.

From Scratch Neural Network Model

The same structure of the pretrained model is provided also for training with the initial dataset. Through the endpoints, specific training optional parameters can be provided, such as:

curl -X POST http://localhost:4010/api/new_model/train[?optional_parameter1=X&optional_parameter2=Y&...]

Note: It is recommended to use Postman for sending requests, except if you connect with another API.

with [] denoting the position of optional parameters, optional_parameter1 and optional_parameter2 the actual names of the parameters and X,Y their corresponding actual values. The optional parameters are listed below:

Optional Training Parameter	Default Value	Format in the Endpoint
test_size	`test_size = 0.2`	`http://localhost:4010/api/new_model/train?test_size=0.3`
batch_size	`batch_size = 512`	`http://localhost:4010/api/new_model/train?batch_size=32`
epochs	`epochs = 20`	`http://localhost:4010/api/new_model/train?epochs=50`
frac (% of the initial dataset in (0,1])	`frac = 1`	`http://localhost:4010/api/new_model/train?frac=0.3`

Note: It is recommended to use Postman for sending requests, except if you connect with another API. (for training, it is highly recommended :D ) The combination of the above parameters is feasible through the & operator. The API for the prediction works very similar to the API of the pretrained.

Action	HTTP Method	Endpoint
Get prediction of a single submission rate	`GET`	`curl -H "Content-Type: application/json" -X GET --data '{"surveyID": 58485, "features": [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2]}' http://localhost:4010/api/new_model/predict`
Get predictions of several submission rates	`GET`	`curl -H "Content-Type: application/json" -X POST --data '[{"surveyID": 58485, "features": [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2]},{"surveyID": 4234232, "features": [0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,2]}]' http://localhost:4010/api/new_model/predict`

Documentation

Generated from swagger, you can log in a browser at:

http://localhost:4010/api

Documentation is provided with automatic interactive API documentation.

Other Endpoints

Health endpoint

A useful endpoint is the one that checks that the docker is up:

curl -X GET http://localhost:4010/api/health

Logger endpoint

The endpoint to fetch the internal logger file is: curl -X GET http://localhost:4010/api/log

Useful Links

To support working and testing with the regression_predict_subrate, it is optional (yet, highly recommendable)) to use next tools:

Robomongo - Robomongo 0.9.0-RC4
POSTMAN - Chrome Plugin for HTTP communication

Lead Developers

The following lead developers are responsible for this repository and have admin rights. They can, for example, merge pull requests.

Panagiotis Stavrianos (panstav1)

Please use the GitHub issues and the e-mail for feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
packages.txt		packages.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction of submission rates

Installation and Dependencies

Installing from code in bare metal

Docker-based

Docker-compose-based (highly recommended)

Developing/Contributing

Dependencies

Submitting/Requesting changes

Usage

Dataset

Pretrained Neural Network Model

From Scratch Neural Network Model

Documentation

Other Endpoints

Health endpoint

Logger endpoint

Useful Links

Lead Developers

About

Releases

Packages

Languages

panstav1/regression_predict_subrate

Folders and files

Latest commit

History

Repository files navigation

Prediction of submission rates

Installation and Dependencies

Installing from code in bare metal

Docker-based

Docker-compose-based (highly recommended)

Developing/Contributing

Dependencies

Submitting/Requesting changes

Usage

Dataset

Pretrained Neural Network Model

From Scratch Neural Network Model

Documentation

Other Endpoints

Health endpoint

Logger endpoint

Useful Links

Lead Developers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages