Climate API Readme

Climate API Readme

Requirements

The following programs and modules are required to run the climate API

Python 3.7
Postgres
Modules

modules	version
dash_bootstrap_components	0.12.2
SQLAlchemy_Utils	0.36.8
Flask	1.1.2
dash	1.19.0
msedge_selenium_tools	3.141.3
statsmodels	0.12.0
urllib3	1.25.11
plotly	4.14.3
dash_core_components	1.3.1
numpy	1.19.2
selenium	3.141.0
SQLAlchemy	1.4.15
requests	2.24.0
dash_html_components	1.0.1
pytest	0.0.0
lxml	4.6.1
pandas	1.1.3
python_dateutil	2.8.1
scikit_learn	0.24.2

Python module installation with pip

If desired, install pipenv with following code:

pip install --user pipenv

Installing dependencies with pipenv is done as follows:

pipenv install requests

Pipenv guide

Installing dependencies with pip is done as follows:

pip install numpy

In case of issues while starting climate API or conflicting versions run:

pip freeze | %{$_.split('==')[0]} | %{pip install --upgrade $_}

Python module installation with conda

If desired, activate environment before installing dependencies with following code:

conda create --no-default-packages -n myenv python
conda activate ./envs

Anaconda environment documentation

Installing dependencies with conda is done as follows:

conda install numpy

In case of issues while starting climate API or conflicting versions run:

conda upgrade --all

Installation & Usage

With provided database

Install Python
Install all required modules for the climate API (Requirements)
Install Postgres
Start flask app (Instruction)
Open the web browser
Navigate to localhost:5000
Configure database connection string (Instruction)
Navigate to database tab of administration overview
Press action button to connect to database
Stop server
Open pgAdmin or equivalent
Import database.db
Start server (Instruction)
Navigate to localhost:5000/admin
Use app

Without provided database

Install Python
Install all required modules for climate API (Requirements)
Install Postgres
Start flask app (Instruction)
Open the web browser
Navigate to localhost:5000
Configure database connection string (Instruction)
Navigate to database tab of administration overview
Press action button to connect to database
Press action button to create database tables
Run ETL load
Run Core load
Use app

With VSCode and Edge

Install Python
Install all required modules for climate API (Requirements)
Install Postgres
Open VSCode
Launch debug config FE + Flask found in doc/launch.json
Configure database connection string (Instruction)
Navigate to database tab of administration overview
Press action button to connect to database
Press action button to create database tables
Run ETL load
Run Core load
Use app

Start flask app

Command Prompt

> set FLASK_APP=app.py
> set FLASK_ENV=production
> flask run

PowerShell

> $env:FLASK_APP = "app.py"
> $env:FLASK_ENV = "production"
> flask run

Linux (untested)

export FLASK_APP=app.py
export FLASK_ENV=production
flask run

Important, start anaconda / pip environment before starting the flask app

Install Browser driver

In order to get new data selenium requires a browser driver to scrape websites

Browser version is checked automatically

Installation as follows:

Edge

Navigate to localhost:5000/admin
Press Driver name
Press Download driver

Chrome

Navigate to localhost:5000/admin/driver/Chrome?headless=false

Configure database connection string

Navigate to localhost:5000/admin
Choose database type in drop down (Only supports Postgres)
Enter Database username
Enter database password (Not encrypted!)
Enter database location (Only supports localhost)
Select port
Submit form

Postgres connection string is save in plain text in config/config.json

Extensions

Add new parameters

Open idawebConfig.xml
Add new parameter with name, group and granularity
Restart Server
Navigate to localhost:5000/admin/database
Click on idaweb_t
Run increment load

New login information

In the case of a blocked idaweb account

Open webscraping.py
Change the login information at the start of the file

How it works

Scraping

webscraping.py

webscraping.py contains both meteoschweiz and idaweb scraping functions

Both scraping methods utilize selenium to login and navigate webpages. Selenium is currently configured with displayed browser in order to check activity. Navigation and click events on page are done with either xpath or javascript paths.

Downloading of data on idaweb are done with the Python request module in headless mode. Sessions are passed as arguments for each requests.

API

app.py & API folder

app.py

Instantiates the blueprint for all sub APIs in the API folder
Contains the main routes for the API

API folder

All blueprints for different parts of the API

adminAPI.py

Contains the main admin page routes

dbAPI.py

Contains database routes on the admin page
Contains all database interface routes

scrapeAPI.py

Contains all scraping routes

streamAPI.py

Handles all sse streams to the front end

db.py

db.py does the following things:

All interaction with database
1. Database creation
2. Table creation
3. Selects
4. Inserts
Creates announcer for the front end
Creates messages of database status and sends them over the sse to the front end

download.py

Helper file with functions for POST and GET requests

Contains helper functions for idaweb file download

idawebConfig.xml

Contains idaweb parameters to download and refresh

idawebConfigInitial.xml

Used for development as temporary storage of configurations

messageAnnouncer.py

messageAnnouncer.py does the following things:

sse
queueing
formatting

responseDict.py

responseDict.py does the following things:

Response sending for the front end
Button disabling for the front end
Creating a progressbar for the front end
Starting materialized view refresh after data inserts

abstractDriver.py

abstractDriver.py handles all selenium driver interactions

Driver installation
Creating front end information about driver status

Dashboard

dashboard.py

dashboard.py does the following things:

Creation of the dashboard its structure
Selection of the data displayed on the dashboard
Wrangling of the selected data
Handling of user interaction using callbacks

Story

story.py

story.py does the following things:

Creation of the story its structure
Selection of data displayed in the story
Wrangling of the selected data

Tests

webscraping_test.py

Contains all unit tests of the webscraping

db_test.py

Contains all unit tests of the database

Database implementation

Database is divided into two main schemas, Stage and Core
All tables have corresponding materialized views for number of rows and last update
Data is copied from left to right
- Text files / Web into stage tables
- Stage tables into Core tables

Stage

Stage schema contain all new data

Can contain duplicate entries
Has No primary keys
Contains raw data

Core

Cannot contain duplicate entries due to natural primary key violation
Data is indexed for faster selects
idaweb_t and meteoschweiz_t are merged into measurements_t table
Columns get added for the description of the data source
Data gets parsed into the format used in future analysis
Core data never gets deleted, can be used to add new data

Name		Name	Last commit message	Last commit date
Latest commit History 546 Commits
.github/workflows		.github/workflows
api		api
assets		assets
data		data
doc		doc
queries		queries
static		static
templates		templates
.gitignore		.gitignore
abstractDriver.py		abstractDriver.py
app.py		app.py
app.spec		app.spec
dashHelper.py		dashHelper.py
dashboard.py		dashboard.py
database.db		database.db
db.py		db.py
db_test.py		db_test.py
download.py		download.py
idawebConfig.xml		idawebConfig.xml
idawebConfigInitial.xml		idawebConfigInitial.xml
messageAnnouncer.py		messageAnnouncer.py
pytest.ini		pytest.ini
readme.md		readme.md
requirements.txt		requirements.txt
responseDict.py		responseDict.py
story.py		story.py
storyText.py		storyText.py
webscraping.py		webscraping.py
webscraping_test.py		webscraping_test.py

julienkellerhals/klimadaten-api

Folders and files

Latest commit

History

Repository files navigation