- 王靖文, 105703057
- 鄭以湉, 106304003
- 黃大瑋, 107207438
Our goal is to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C. based on different conditions.
You should provide an example commend to reproduce your result
Rscript code/data_science_final.R
- any on-line visualization
- Your presentation, 1101_datascience_FP_Group9.ppt, by Jan. 13
- Any related document for the final project
- papers
- software user guide
- Source
- Input format
- CSV file
- Any preprocessing?
- method1 : convert "datetime" variable into four variables (year, month, day, hour)
- method2 : method1 + remove outliers in "count" variable
- method3 : method2 + create four groups based on different hour period (from peak period to off-peak period)
- Which method do you use?
- Lasso, Xgboost, Random Forest
- What is a null model for comparison?
- Our null model is the mean of the count from training data.
- How do your perform evaluation? ie. cross-validation, or addtional indepedent data set
- We apply cross-validation to get optimal hyperparameters and then add into training process.
- Which metric do you use
- Root Mean Squared Logarithmic Error(RMSLE)
- Is your improvement significant?
- yes, we create some versions of training data based on different methods of data cleaning, and the testing RMSLE decreases significantly after training the models by using those versions of training data
- What is the challenge part of your project?
- data cleaning
- Score on Kaggle
- Code/implementation which you include/reference (You should indicate in your presentation if you use code for others. Otherwise, cheating will result in 0 score for final project.)
- Packages you use
- library(lubridate)
- library(randomForest)
- library(glmnet)
- library(ModelMetrics)
- ibrary(xgboost)
- Related publications