Skip to content

This project involves the collection, cleaning, and analysis of Toronto real estate listings data. The goal is to extract valuable insights from the data and create a machine learning model to predict property prices.

Notifications You must be signed in to change notification settings

fannysigouin/project4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Group 1 - Toronto Real Estate Listings Project

Overview

This project involves the collection, cleaning, and analysis of Toronto real estate listings data. The goal is to extract valuable insights from the data and create a machine learning model to predict property prices.

Table of Contents

Data Collection

The project collects Toronto real estate listings data from multiple sources, including web scraping and API requests. The collected data includes property details such as address, price, baths, beds, and geographical coordinates.

Data Cleaning

The data cleaning process involves handling missing values, formatting issues, and extracting latitude and longitude using the Geoapify API. Additionally, luxury listings with more than 5 bathrooms or more than 4 beds were removed, and outliers were addressed using the Interquartile Range (IQR) method. The cleaned data is stored in CSV files in the data folder.

Database Creation

The cleaned data is imported into a PostgreSQL database named listings_db using SQLAlchemy. The database has a table named toronto_listings with columns like mls_id, property_type, address, and more.

Modeling

A Random Forest Regressor model has been implemented to predict property prices based on features such as baths, beds, dens, relative latitude, and relative longitude. The model's performance is evaluated using cross-validation, providing Mean Absolute Error (MAE) scores for each fold. Neighbourhood-wise analysis revealed varying ratios of prediction errors, with specific attention given to neighbourhoods with a small number of listings.

Deployment

The machine learning model is deployed using a cloud-based infrastructure, specifically on Amazon Web Services (AWS). The deployment process involves the following steps:

  1. Model Serialization: The trained Random Forest Regressor model is serialized using the joblib library.

  2. Flask API Endpoint: A Flask web application is set up to serve as an API endpoint for the machine learning model. The Flask application uses Flask and Flask-CORS to handle HTTP requests and responses, providing a seamless interaction with the deployed model.

  3. PostgreSQL Database Interaction: SQLAlchemy is utilized to interact with the PostgreSQL database named listings_db. The database stores relevant information about Toronto real estate listings.

  4. API Usage: Users can make HTTP POST requests to the Flask API endpoint, providing property features as input in the request body. The API will respond with predicted property prices.

  5. Containerization: The serialized model, database creation script, and flask application are encapsulated within a Docker container, with dependencies specified in the requirements.txt file to ensure consistent and reproducible deployment across different environments. When running the container, the database is created and the flask app is started using gunicorn. The Docker image is then pushed to docker hub.

  6. Azure Web App: The application is deployed using Azure Web App Services and the Docker image that is available via Docker Hub.

  7. URL: Toronto Real Estate Price Predictor

Contributors

  • Fanny Sigouin
  • Jorge Nardy
  • Kamal Farran
  • Tania Barrera

References

Data Collection

  • Geoapify API - Used for geocoding addresses and obtaining latitude and longitude.
  • Listing.ca - Source of real estate data for Toronto listings.

Data Cleaning

Database Creation

Modeling

Deployment

Web Development

General

Python Libraries

  • Pandas - Powerful data manipulation library for Python.
  • NumPy - Library for numerical operations in Python.
  • Scikit-learn - Machine learning library for Python.

About

This project involves the collection, cleaning, and analysis of Toronto real estate listings data. The goal is to extract valuable insights from the data and create a machine learning model to predict property prices.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •