Data Storage

./data/DeepAir/STData
The most critical data storage path, which can be directly utilized for pre-training GBM. Pay attention to the dataPath defined at the beginning of each program, as it specifies the dynamic data for all time periods.

baseline_models

RF

Random forest model with situ data.

GBM

GBM model with situ data.

DCNN

DCNN model, using CNN + MLP to predict PM2.5.

CA_PM25_prediction

data_preprocess

daily_dataPrepare_universalCNNembed.py
Processes all grid data in California using the pre-trained CNN model. Encodes Elevation and LandCover information and adds it to the original data to generate new data for prediction.
- Depends on the pre-trained models in preTrain_and_mergeData/preTrainCNN/ElevaLand.preTrain.shapeNoDrop/saved_earlyStop_weights_yearsG8.
- Uses the pre-trained lightGBM model for subsequent operations.
daily_dataPrepare_universalDataMerge.py
Merges encoded LandCover and Elevation data with previous data to replace features [this step is not actually required].
- Depends on data from the original SituData.
- Each day’s records are merged with the corresponding encoded LandCover and Elevation data for each grid.
daily_CA_PM25_mergeGBMPred_batch.py
Predicts PM2.5 values for each grid using the trained lightGBM model, with additional data merging operations.
CA_ElevaLand.preTrain.waterDropLet
Stores prediction results, visualizations, and data:
- Organized by year and date.
- CA_allGrid_PM25_pred.csv: Stores prediction results for specific dates.
- CA_daily_gridPM.pkl: Formats predicted values for each grid (CA_lon, CA_lat, CA_grid_value).
- CA_PM_plot.png: Visualizes prediction results.

DeepAir

preTrainCNN_and_mergeData

preTrainCNN

Pre-train a CNN model to directly predict PM2.5.

The model structure used is situNet_ElevaLand.

Use CNN to encode Elevation and LandCover information.
Use embedding encoding for date and basinID data, which will serve as input for the subsequent DNN model.

After grouped training, store the trained models in ElevaLand.preTrain.shapeNoDrop, which will later be used to directly read data.

CNN_embed_and_merge_data

Based on the pre-trained CNN PM2.5 prediction model:

Encode data from the original dataset containing PM2.5 labels.
Integrate Elevation and LandCover information to generate new data.
This new data is used for subsequent lightGBM model training.

DeepAir_GBM

Train a lightGBM model for prediction based on the Elevation and LandCover data encoded by the pre-trained CNN model.

This model is used for subsequent predictions across the entire California region.

Running the py files that start with batch_ to train model.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CA_PM25_prediction		CA_PM25_prediction
DeepAir		DeepAir
baseline_models		baseline_models
.DS_Store		.DS_Store
LICENSE		LICENSE
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Storage

baseline_models

RF

GBM

DCNN

CA_PM25_prediction

data_preprocess

DeepAir

preTrainCNN_and_mergeData

preTrainCNN

CNN_embed_and_merge_data

DeepAir_GBM

About

Releases

Packages

Contributors 2

Languages

License

humnetlab/DeepAir

Folders and files

Latest commit

History

Repository files navigation

Data Storage

baseline_models

RF

GBM

DCNN

CA_PM25_prediction

data_preprocess

DeepAir

preTrainCNN_and_mergeData

preTrainCNN

CNN_embed_and_merge_data

DeepAir_GBM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages