- Climate API Readme
The following programs and modules are required to run the climate API
- Python 3.7
- Postgres
- Modules
modules | version |
---|---|
dash_bootstrap_components | 0.12.2 |
SQLAlchemy_Utils | 0.36.8 |
Flask | 1.1.2 |
dash | 1.19.0 |
msedge_selenium_tools | 3.141.3 |
statsmodels | 0.12.0 |
urllib3 | 1.25.11 |
plotly | 4.14.3 |
dash_core_components | 1.3.1 |
numpy | 1.19.2 |
selenium | 3.141.0 |
SQLAlchemy | 1.4.15 |
requests | 2.24.0 |
dash_html_components | 1.0.1 |
pytest | 0.0.0 |
lxml | 4.6.1 |
pandas | 1.1.3 |
python_dateutil | 2.8.1 |
scikit_learn | 0.24.2 |
If desired, install pipenv with following code:
pip install --user pipenv
Installing dependencies with pipenv is done as follows:
pipenv install requests
Installing dependencies with pip is done as follows:
pip install numpy
In case of issues while starting climate API or conflicting versions run:
pip freeze | %{$_.split('==')[0]} | %{pip install --upgrade $_}
If desired, activate environment before installing dependencies with following code:
conda create --no-default-packages -n myenv python
conda activate ./envs
Anaconda environment documentation
Installing dependencies with conda is done as follows:
conda install numpy
In case of issues while starting climate API or conflicting versions run:
conda upgrade --all
- Install Python
- Install all required modules for the climate API (Requirements)
- Install Postgres
- Start flask app (Instruction)
- Open the web browser
- Navigate to localhost:5000
- Configure database connection string (Instruction)
- Navigate to database tab of administration overview
- Press action button to connect to database
- Stop server
- Open pgAdmin or equivalent
- Import database.db
- Start server (Instruction)
- Navigate to localhost:5000/admin
- Use app
- Install Python
- Install all required modules for climate API (Requirements)
- Install Postgres
- Start flask app (Instruction)
- Open the web browser
- Navigate to localhost:5000
- Configure database connection string (Instruction)
- Navigate to database tab of administration overview
- Press action button to connect to database
- Press action button to create database tables
- Run ETL load
- Run Core load
- Use app
- Install Python
- Install all required modules for climate API (Requirements)
- Install Postgres
- Open VSCode
- Launch debug config FE + Flask found in doc/launch.json
- Configure database connection string (Instruction)
- Navigate to database tab of administration overview
- Press action button to connect to database
- Press action button to create database tables
- Run ETL load
- Run Core load
- Use app
Command Prompt
> set FLASK_APP=app.py
> set FLASK_ENV=production
> flask run
PowerShell
> $env:FLASK_APP = "app.py"
> $env:FLASK_ENV = "production"
> flask run
Linux (untested)
export FLASK_APP=app.py
export FLASK_ENV=production
flask run
Important, start anaconda / pip environment before starting the flask app
In order to get new data selenium requires a browser driver to scrape websites
Browser version is checked automatically
Installation as follows:
Edge
- Navigate to localhost:5000/admin
- Press
Driver name
- Press
Download driver
Chrome
- Navigate to localhost:5000/admin
- Choose database type in drop down (Only supports Postgres)
- Enter Database username
- Enter database password (Not encrypted!)
- Enter database location (Only supports localhost)
- Select port
- Submit form
Postgres connection string is save in plain text in config/config.json
- Open idawebConfig.xml
- Add new parameter with name, group and granularity
- Restart Server
- Navigate to localhost:5000/admin/database
- Click on idaweb_t
- Run increment load
In the case of a blocked idaweb account
- Open webscraping.py
- Change the login information at the start of the file
webscraping.py contains both meteoschweiz and idaweb scraping functions
Both scraping methods utilize selenium to login and navigate webpages. Selenium is currently configured with displayed browser in order to check activity. Navigation and click events on page are done with either xpath or javascript paths.
Downloading of data on idaweb are done with the Python request module in headless mode. Sessions are passed as arguments for each requests.
app.py & API folder
- Instantiates the blueprint for all sub APIs in the API folder
- Contains the main routes for the API
- All blueprints for different parts of the API
- Contains the main admin page routes
- Contains database routes on the admin page
- Contains all database interface routes
- Contains all scraping routes
- Handles all sse streams to the front end
db.py does the following things:
- All interaction with database
- Database creation
- Table creation
- Selects
- Inserts
- Creates announcer for the front end
- Creates messages of database status and sends them over the sse to the front end
Helper file with functions for POST
and GET
requests
- Contains helper functions for idaweb file download
- Contains idaweb parameters to download and refresh
- Used for development as temporary storage of configurations
messageAnnouncer.py does the following things:
- sse
- queueing
- formatting
responseDict.py does the following things:
- Response sending for the front end
- Button disabling for the front end
- Creating a progressbar for the front end
- Starting materialized view refresh after data inserts
abstractDriver.py handles all selenium driver interactions
- Driver installation
- Creating front end information about driver status
dashboard.py does the following things:
- Creation of the dashboard its structure
- Selection of the data displayed on the dashboard
- Wrangling of the selected data
- Handling of user interaction using callbacks
story.py does the following things:
- Creation of the story its structure
- Selection of data displayed in the story
- Wrangling of the selected data
Contains all unit tests of the webscraping
Contains all unit tests of the database
- Database is divided into two main schemas, Stage and Core
- All tables have corresponding materialized views for number of rows and last update
- Data is copied from left to right
- Text files / Web into stage tables
- Stage tables into Core tables
Stage schema contain all new data
- Can contain duplicate entries
- Has No primary keys
- Contains raw data
- Cannot contain duplicate entries due to natural primary key violation
- Data is indexed for faster selects
- idaweb_t and meteoschweiz_t are merged into measurements_t table
- Columns get added for the description of the data source
- Data gets parsed into the format used in future analysis
- Core data never gets deleted, can be used to add new data