Skip to content

amason445/us_mortgages_flask_website

Repository files navigation

US Mortgage Data with Flask and CouchDB

Project Summary

This repository contains artifacts from my second capstone project that I built during my graduate program at Regis University. For this project, I built a NoSQL database and a Flask website to analyze US Mortgage data. The mortgage data was sourced from the Consumer Finance Protection Bureau (CFPB). CouchDB views were written on top of this data and ingested into a Flask website which contains a variety of visualizations. The visualizations trend financial metrics over time and these metrics are also mapped to state counties. This repository also includes the test cases and context for my CouchDB architecture.

The full scrape from the CFPB was 35 gigabytes and included full pipline data of around 90 million mortgage loans from 2018 to 2022. To scope this project, I took a subset of this data for Arizona, Colorado, New Mexico and Utah which was about 1.8 gigabytes and contained around 4,630,000 mortgages from 2018 to 2022. Additionally, I had some issues rendering and writing the website views. This portion required a lot of research and I eventually used OpenAI's ChatGPT to help me with some of the rendering, HTML and CSS. Finally, I originally intended to use HoloViews for this project but I switched to Folium. Folium seems to work really well with Flask because it renders cleanly and easily.

Technology Used

  • Flask
  • CouchDB
  • Python
  • Pandas, GeoPandas, GeoJSON
  • Matplotlib, Seaborn, Folium
  • HTML/CSS

For all packages and dependenices used, please see requirements.txt. A vitural environment was used for this project and this file can be used to conigure it with pip.
Use: pip -r requirements.txt

To deploy this website, you will need to set the following environment variables:

  • $env:FLASK_APP = ".\app\flasky.py"
  • `$env:FLASK_DEBUG = 1 (for debug mode)
  • $env:COUCHDB_ROOT_URL = "YOUR_COUCH_DB_ROOT_URL"

Data Source

This data was sourced from the Consumer Finance Protection Bureau's Home Mortgage Disclosure Act API. The data collects lending records submitted by nationwide mortgage lenders focusing on residential mortgages. It covers a diverse range of products including fixed rate mortgages, adjustable rate mortgages, VA Loans, first and second liens, and it includes rich financial data ranging from the loan amount, loan term, interest rates to fees and products costs. Additionally, it includes rich demographic data including ethnicity, gender, income, geographical information and credit information (including Debt to Income). The API documentation and data dictionary are linked below:

Project Layout

  • CouchDB: Samples and Design Document: Contains sample JSON documents from the CouchDB Database and the most recent design document.
  • ShapeETL: Contains a prototype script to process state and county geopgraphies so they can be joined with the mortgage data and rendered.
  • app: Contains the full Flask app including modules and HTML/CSS templates needed to render the website.
  • load_db: Contains the load process from the API to CouchDB. The initial load logs are included.
  • tests: Contains unit tests and the most recent outputs from these tests

Flask Architecture

This website uses the "Model-View-View-Model" architecture detailed below. First, models are queried and combined from CouchDB design documents, then they are wrangled and visualized before finally being rendered into an HMTL/CSS front end. The data models are built using something similar to an interface in Java which makes them easy to reuse and test. For visualization, these models are brought into a variety of view models leveraging Seaborn, Matplotlib and Folium. Finally, on the back end, ach raw document in CouchDB will contain up to a maximum of 1000 mortgage records and each design document will map reduce these raw document.

Sample Data and Results

Below, I've included some sample visualizations from the project from Colorado including renderings of sample dashboards:

alt_text alt text

Future Ideas

  • Migrate the website to a cloud service to increase processing power and scale
  • Build a distributed system with CouchDB to increase scale
  • Scrape more US States and set up the process to scrape annually
  • Build out the website to include more financial metrics, borrower credit and income data, and demographic data
  • Set up robust security and error handling for a public website deployment
  • Incorporate other Flask packages like an email server
  • Migrate the website to a different framework like Django

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages