Group 1 - Toronto Real Estate Listings Project

Overview

This project involves the collection, cleaning, and analysis of Toronto real estate listings data. The goal is to extract valuable insights from the data and create a machine learning model to predict property prices.

Data Collection

The project collects Toronto real estate listings data from multiple sources, including web scraping and API requests. The collected data includes property details such as address, price, baths, beds, and geographical coordinates.

Data Cleaning

The data cleaning process involves handling missing values, formatting issues, and extracting latitude and longitude using the Geoapify API. Additionally, luxury listings with more than 5 bathrooms or more than 4 beds were removed, and outliers were addressed using the Interquartile Range (IQR) method. The cleaned data is stored in CSV files in the data folder.

Database Creation

The cleaned data is imported into a PostgreSQL database named listings_db using SQLAlchemy. The database has a table named toronto_listings with columns like mls_id, property_type, address, and more.

Modeling

A Random Forest Regressor model has been implemented to predict property prices based on features such as baths, beds, dens, relative latitude, and relative longitude. The model's performance is evaluated using cross-validation, providing Mean Absolute Error (MAE) scores for each fold. Neighbourhood-wise analysis revealed varying ratios of prediction errors, with specific attention given to neighbourhoods with a small number of listings.

Deployment

The machine learning model is deployed using a cloud-based infrastructure, specifically on Amazon Web Services (AWS). The deployment process involves the following steps:

Model Serialization: The trained Random Forest Regressor model is serialized using the joblib library.
Flask API Endpoint: A Flask web application is set up to serve as an API endpoint for the machine learning model. The Flask application uses Flask and Flask-CORS to handle HTTP requests and responses, providing a seamless interaction with the deployed model.
PostgreSQL Database Interaction: SQLAlchemy is utilized to interact with the PostgreSQL database named listings_db. The database stores relevant information about Toronto real estate listings.
API Usage: Users can make HTTP POST requests to the Flask API endpoint, providing property features as input in the request body. The API will respond with predicted property prices.
Containerization: The serialized model, database creation script, and flask application are encapsulated within a Docker container, with dependencies specified in the requirements.txt file to ensure consistent and reproducible deployment across different environments. When running the container, the database is created and the flask app is started using gunicorn. The Docker image is then pushed to docker hub.
Azure Web App: The application is deployed using Azure Web App Services and the Docker image that is available via Docker Hub.
URL: Toronto Real Estate Price Predictor

Contributors

Fanny Sigouin
Jorge Nardy
Kamal Farran
Tania Barrera

References

Data Collection

Geoapify API - Used for geocoding addresses and obtaining latitude and longitude.
Listing.ca - Source of real estate data for Toronto listings.

Data Cleaning

Pandas Documentation - Reference for data manipulation using Pandas.
Regular Expressions in Python - Guide for using regular expressions in Python.
Pathlib Documentation - Documentation for working with file paths using Pathlib.

Database Creation

PostgreSQL Documentation - Official documentation for PostgreSQL.

Modeling

Scikit-learn Documentation - Documentation for the Scikit-learn machine learning library.

Deployment

Azure Documentation - Azure App Service documentation for setting up and deploying the application.
Flask Documentation - Flask documentation for setting up API endpoint.
Docker Documentation - Docker documentation for containerization in deployment.

Web Development

Bootstrap Documentation - Bootstrap documentation for setting up HTML, CSS and Java framekwork.

General

SQLAlchemy Documentation - Reference for using SQLAlchemy for database interactions.
Side Navigation - Code used to create the side navigation.

Python Libraries

Pandas - Powerful data manipulation library for Python.
NumPy - Library for numerical operations in Python.
Scikit-learn - Machine learning library for Python.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
ETL and EDA		ETL and EDA
data		data
deploy		deploy
docs		docs
model		model
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group 1 - Toronto Real Estate Listings Project

Overview

Table of Contents

Data Collection

Data Cleaning

Database Creation

Modeling

Deployment

Contributors

References

Data Collection

Data Cleaning

Database Creation

Modeling

Deployment

Web Development

General

Python Libraries

About

Releases

Packages

Contributors 4

Languages

fannysigouin/project4

Folders and files

Latest commit

History

Repository files navigation

Group 1 - Toronto Real Estate Listings Project

Overview

Table of Contents

Data Collection

Data Cleaning

Database Creation

Modeling

Deployment

Contributors

References

Data Collection

Data Cleaning

Database Creation

Modeling

Deployment

Web Development

General

Python Libraries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages