-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstep2.1_cluster_model.Rmd
151 lines (105 loc) · 4.46 KB
/
step2.1_cluster_model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "Cluster Model"
author: "Noah Klammer"
date: "6/28/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
```
## Clear global env and report
```{r include=FALSE}
rm(list = ls())
gc()
```
# Intro
This is my thrid attempt at clustering of zones using the ideal air loads output variable. First I tried 3-space, then I tried 364-space clustering. Confident in the speed of computation, I'm going to try all hours in the simulation year, 8760-space clustering.
## Import Rda data
```{r import, message=FALSE, warning=FALSE}
load(file = "ilas_nodhw.Rda")
load(file = "ilas_dhw_novar.Rda")
load(file = "hvac_dhw_novar.Rda")
load(file = "hvac_dhw_var.Rda")
```
### Set data
```{r}
# CHANGE DATA HERE
df <- ilas_nodhw
```
# Feature normalization
Normalize values by floor area of the respective zone.
## Area to Zone mapping
Load in an external data set of rows with two attributes: zone name and floor area in m^2. Since the zone-area dataset and the Ideal Air Loads dataset may have different order, create a simple indexing function.
### Normalize Loads by Floor Area
```{r warning=FALSE}
### Drop Date/Time and save for later
time_cols <- select(df, c(hour,day,month))
rownames(time_cols) <- rownames(df) # need to keep row labels
df <- select(df, -c(hour,day,month))
### Load Area Map
area_map <- read_csv("ZoneFloorArea-Map.csv")
idx_f <- function(string) { # takes zone string and maps to zone index in area_map
which(string==area_map$`Zone List`) # returns num vec
}
area_idx <- sapply(colnames(df),idx_f) # maps df idx to area idx
area_vec <- area_map$`Space Area [m2]`[area_idx]
# apply normalization vector across columns
df <- sweep(df, 2, area_vec, FUN = "/")
# change units
# **[J]** => [J/m^2]
rownames(df) <- str_replace(rownames(df),"(?<=\\[)J{1}","J/m^2")
### Add Date/Time back in
df <- cbind(time_cols,df)
```
# Clustering
You might think that they would only be heating loads on the winter extreme day, but in this building type and climate, we find that there is more building cooling load [J] than heating load even during winter.
Traditionally, we would remove columns with zero variance as they are unhelpful in the sense of regression. However, in clustering we may want to leave them in.
## Transpose and cluster
Let's introduce the `apcluster` package which is an implementation of Frey and Dueck's popular Affinity Propagation method for passing messages between pairs of data. I would make sure to reference the [math paper](https://doi.org/10.1080/19401493.2017.1410572), the [R package](https://doi.org/10.1093/bioinformatics/btr406), and the [original method's](https://doi.org/10.1126/science.1136800) publication.
```{r turnkey cluster process}
library(apcluster)
# drop time date cols in
# preparation for clustering
if ("minute" %in% colnames(df)) {
df <- subset(df, select = -c(day, hour, month, minute))
} else {
df <- subset(df, select = -c(day, hour, month))
}
tdf <- as.data.frame(t(df))
APR <- apcluster(negDistMat(r=2), tdf, details = TRUE) # returns a APResult
area_map <- read_csv("ZoneFloorArea-Map.csv")
idx_f <- function(string) { # takes zone string and maps to zone index in area_map
which(string==area_map$`Zone List`) # returns num vec
}
# this creates the scalar vector
# but does not apply the scalar on the df yet
scalars <- vector(mode = "numeric")
for (i in 1:length(APR@clusters)) { # iterate through each cluster
# get list of members of cluster i # char vec
member_zone_names <- names(unlist(APR@clusters[i])) # inclusive of exemplar
# map strings to m2 values # num vec
area_idx <- sapply(member_zone_names, idx_f) # maps cluster idx to area idx
# get floor area num values
member_area_num <- area_map$`Space Area [m2]`[area_idx] # num vec
# create scalar and append to num vec in order of clusters i
# sum area numbers
scalars <- append(scalars, sum(member_area_num)) # scaling factor is sum of areas
}
# reduce dimensionality of 'df' using clusters
red_df <- df[APR@exemplars] # returns 8760 rows with reduced columns
# apply scalar vec from above to red_df
# is the scalar of the right length?
ncol(red_df) == length(scalars)
# apply area scalars to reduced df
red_df <- sweep(red_df, 2, scalars, FUN = "*")
# change units
rownames(red_df) <- str_replace(rownames(red_df),"J/m\\^2","J")
### Add Date/Time back in
red_df <- cbind(time_cols,red_df)
### Save out
ilas_no_dhw_red <- red_df
save(ilas_no_dhw_red, file = "ilas_no_dhw_red.Rda")
```
# End
<br><br><br>