forked from AndyClifton/PCWG-Share-01
-
Notifications
You must be signed in to change notification settings - Fork 0
/
PCWG_share_01_main.Rmd
511 lines (419 loc) · 23.8 KB
/
PCWG_share_01_main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
---
title: "PCWG_share_01_main"
author: "Andy Clifton"
date: '`r Sys.Date()`'
output: pdf_document
---
<!-- ## Reading the source code?
This is R Markdown. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com> or <http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html>.
This script is designed to be used with RStudio. When you click the **Knit** button in RStudio a document will be generated that includes the output of any embedded R code chunks within the document. -->
# Introduction
This document contains the results of the Power Curve Working Group's Share_01 exercise, which ran from October to December 2015. The document and results are generated using the programing language `R` from the _PCWG_share_01_main.rmd_ file and can be run by participants themselves.
## How to use PCWG_share_01_main.rmd
install R (<http://www.r-project.org>) and Rstudio (<http://www.rstudio.com>), and then create a directory with all of the code and files (see below). When you click the **Knit** button in RStudio a document will be generated that includes text and results from the code embedded in _PCWG_share_01_main.rmd_.
```{r clean up, echo = FALSE}
rm(list = ls())
```
## User Inputs
The _project.root_ variable defines the location of the files required for this analysis. The _made.by_ variable forms part of a label that will be added to the plots. _data.public_ is a flag that indicates whether the results of the analysis are intended to be public, or not. _data.analyze.raw_ is a flag that indicates whether individual data files should be (re)analyzed (_data.analyze.raw = TRUE_) or whether saved, aggregated data should be used (_data.analyze.raw = FALSE_).
The following user inputs were used in the preparation of this document:
```{r user-defined options}
# Where can files be found?
project.root <- file.path('/Users/stuart/PCWG-Share-01')
# Who ran this script
made.by = "A. Clifton, NREL"
# Will data be public or not?
data.public = TRUE
#change setting below to toggle normalisation
# FALSE = Normalise to energy in bin (may overstate errors in low wind speed bins)
# TRUE = Normalise to energy in dataset
data.normalise_to_all = TRUE
# Reanalyze existing data?
data.analyze.raw = TRUE
# software version to use:
# version string in "x.x.x" format
sw.version = "0.8.0"
# should we use versions less, equal, or greater than that?
sw.version.logic = "from"
# options are "to" (<=), "equals" (==), or "from" (>=).
# See SelectDatabySWVersion.R for implications.
# Do not load versions of the tool before...
# will be ignored if "" is used
# use data versions in the form "x.x.x".
# Note that this may conflict with some code
data.version.earliest ="0.8.0"
```
Note that _data.version.earliest_ is used to limit the data that are processed; data before this version number will not be used. In this document, _data.version.earliest_ was set to `cat('"',data.version.earliest,'"')`. Also, a major change in data processing occurred in version 0.5.9 of the PCWG tool. Therefore, the functions used in the preparation of this document have two inputs (_sw.version_ and _sw.version.logic_) that can be used to select the data that are used for each plot.
```{r check old data actually exists, results="asis", echo = FALSE}
# define the path to the directory that we will use to store all of the data
output.dir = file.path(project.root,
"analysis",
"all")
# create this directory if it doesn't already exist
dir.create(output.dir,
showWarnings = FALSE,
recursive = TRUE)
# check for existing data. If none exists, set data.analyze.raw back to TRUE
if (data.analyze.raw == FALSE){
if (file.exists(file.path(output.dir,"AggregatedData.RData"))){
cat(text = "\n This document was produced from data saved at ", file.path(output.dir,"AggregatedData.RData"), "\n")
} else {
cat(text = "\nThis document was produced from raw data files.\n")
data.analyze.raw <- TRUE
}
} else {
cat(text = "\nThis document was produced from raw data files.\n")
}
```
## Packages
### Standard Packages
This script requires the _ggplot2_, _grid_, _knitr_, _RColorBrewer_, _rgdal_, _stringr_, and _XLConnect_ packages to run. These are called from the script but you may need to install them directly. For details of how to install packages, see the RStudio help.
```{r load packages, message=FALSE, echo = FALSE}
require(ggplot2)
require(grid)
require(XLConnect)
require(knitr)
require(reshape2)
require(stringr)
```
### WARNING
Note that currently (September 21, 20216) the plots require the developer version of _ggplot2_. The github master of ggplot2 can be found at <https://github.com/hadley/ggplot2> . Installing this version requires the _devtools_ package. To install _devtools_ and the developer version of _ggplot2_, type:
````{r install_dev_ggplot2, eval=FALSE, echo=TRUE}
install.packages("devtools")
devtools::install_github("hadley/ggplot2")
````
## Directory structure
```{r define file locations, message=FALSE, echo = FALSE}
# define the working directory
working.dir <- project.root
setwd(working.dir)
#identify data directory
data.dir = file.path(project.root,
"data")
# define where functions live
code.dir = file.path(project.root,
"code")
# source these functions
code.files = dir(code.dir, pattern = "\\.R$")
for (file in code.files){
source(file = file.path(code.dir,file))
}
```
The folowing files should be placed in the _project.root_ directory:
* PCWG_share_01_main.Rmd
* /__analysis__ directory containg results of the analysis
* /__code__ directory containing functions required for the analysis
* /__data__ directory containing all data files to be analyzed. This can include further sub directories.
+ All .xls files contained in __data__ and sub directories will be used in the analysis. Any __Summary.R__ files will be ignored.
+ A __metaData.R__ file can be added to __data__ and include the line _data.supplier.type <- <<supplier type>>_, where <<<<supplier type>> could be one of "Consultant", "Developer", "Owner/Operator", or "OEM".
```{r configure graphics, message=FALSE, echo = FALSE}
# configure graphics appearance
#theme_set(theme_PCWG(base_size = 8,
# base_family = "sans"))
```
# Results from each data set
We now analyse the data from each data set. The plots are saved to their own directories in the _analysis_ directory. If _data.public_ is FALSE, plots will be created for every data set. If _data.public_ is TRUE, only the final, aggregated data plots will be created.
```{r load individual data files, results='asis', echo = FALSE}
# set a counter to check for numbers of files that are data.version.earliest or later
count.version = 0
if (data.analyze.raw == TRUE){
# identify the data sets that we have available
data.files = dir(data.dir,
recursive = TRUE,
pattern = "\\.xls$")
# remove the file called summary.xls
data.files <- data.files[lapply(data.files,function(x) length(grep("ummary",x,value=FALSE))) == 0]
# create an empty list to store data in at a later date
all.data <- list(sub = NULL,
meta = NULL,
errors = NULL)
for (data.file in data.files){
# Read this data set;
# open the excel sheet
wb <- loadWorkbook(file.path(data.dir,data.file))
# read the submission data
in.sub <- ReadSubmissionData(wb)
in.sub$data.file <- data.file
# check the version
if (data.version.earliest !=""){
# return 1 if the imported file is later than data.version.earliest
# return 0 if the imported file is the same as data.version.earliest
do.continue = compareVersion(substr(in.sub$sw.version, start =1, stop =5),
data.version.earliest )
#
} else {
do.continue = 0
}
# depending on how up to date the version is, continue or stop
if (do.continue < 0){
cat(text = paste0("\nChecking software versions. ", data.file ,
" was produced using ", in.sub$sw.version,
": skipping file.\n"))
rm(list = c("wb","in.sub"))
} else {
cat(text = paste0("\nChecking software versions. ", data.file ,
" was produced using ", in.sub$sw.version,
": using file.\n"))
count.version = count.version+1
# keep going; read more of the file
# read the meta data
in.meta <- ReadMetaData(wb)
meta.data.file <- file.path(data.dir,
dirname(data.file),
"metaData.R")
if (file.exists(meta.data.file)){
# read the meta data file associated with the entire submission
source(file = meta.data.file)
in.meta$data.supplier.type <- data.supplier.type
} else {
in.meta$data.supplier.type <- NA
}
# append all of this good stuff
in.meta$data.file <- data.file
in.meta$sw.version <- in.sub$sw.version
# read *all* errors
in.errors <- ReadErrorData(wb,
in.sub$sw.version)
if (data.public == FALSE){
# write out results from this case to file and to the document
cat(text = "\n## Data set", in.sub$random.ID, "from", data.file, "\n")
# check to see if we have a directory for this case
output.dir = file.path(project.root,
"analysis",
strsplit(data.file, "\\.")[[1]][1])
dir.create(output.dir, showWarnings = FALSE,recursive = TRUE)
# create a label we'll use to annotate plots
plot.label <- labelSubmission(in.sub,
made.by)
# plot errors
if (NROW(na.omit(in.errors$by.WS$error.val.pc)) >=1){
PlotSubErrorsByWS(in.errors$by.WS,
plot.label,
output.dir)
}
if (NROW(na.omit(in.errors$by.TOD$error.val.pc)) >=1){
PlotSubErrorsByTOD(in.errors$by.TOD,
plot.label,
output.dir)
}
if (NROW(na.omit(in.errors$by.CM$error.val.pc)) >=1){
PlotSubErrorsByCM(in.errors$by.CM,
plot.label,
output.dir)
}
if (NROW(na.omit(in.errors$by.WD$error.val.pc)) >=1){
PlotSubErrorsByWD(in.errors$by.WD,
plot.label,
output.dir)
}
if (NROW(na.omit(in.errors$by.4CM$error.val.pc)) >=1){
PlotSubErrorsBy4CM(in.errors$by.4CM,
plot.label,
output.dir)
}
# dummy text to clear knitr buffers
cat(text = "\n")
}
# prepare data for aggregation
all.data <- aggregateDataSets(all.data,
in.sub,
in.meta,
in.errors)
}
}
}
```
``` {r check we have data, echo = FALSE}
if ((count.version == 0)& (data.analyze.raw==TRUE)){
stop("No data found.\n Check $count.version is empty or contains a sensible value, for example '0.5.10'.")
}
```
``` {r save data, echo=FALSE}
# define the path to the directory that we will use to store all of the data
output.dir = file.path(project.root,
"analysis",
"all")
# save the data
if (data.analyze.raw == TRUE){
save(list = c("project.root",
"made.by",
"data.public",
"data.version.earliest",
"working.dir",
"output.dir",
"all.data"),
file = file.path(output.dir,"AggregatedData.RData"),
envir = .GlobalEnv)
} else {
load(file = file.path(output.dir,"AggregatedData.RData"))
}
```
# Data Sets
## Data Suppliers
Data suppliers were categorized according to the role they play in the wind industry. Roles include:
* Consultants: providing technical services to others but with no financial interest in the development or operation of the wind plant
* Developers: site, design, and plan (or carry out) construction of prospective wind plants.
* Owner/operators: have a finacial stake in the wind plant and are involved with some or all of the day-to-day running of the wind plant
* Original Equipment Manufacturers (OEMs): design, manufature, and maintain wind turbines
```{r Histogram of data supplier types, echo=FALSE, fig.width = 6.5, fig.height = 3.5, fig.cap="Data Sources"}
#PlotAllDataSuppliers(all.data$meta,
# sw.version = sw.version,
# sw.version.logic = sw.version.logic,
# output.dir,
# made.by)
```
## Turbine Sizes
In total, `r nrow(all.data$sub)` data sets were submitted. The `r nrow(all.data$sub)` data sets include tests carried out in the period from `r min(all.data$meta$year.of.measurement, na.rm=TRUE)` to `r max(all.data$meta$year.of.measurement, na.rm=TRUE)`. Turbine diameters range from `r min(all.data$meta$turbine.dia)` to `r max(all.data$meta$turbine.dia)` m, while hub heights range from `r min(all.data$meta$turbine.height)` to `r max(all.data$meta$turbine.height)` m.
```{r plot turbine year and size, fig.width = 6.5, fig.height = 3.5, echo=FALSE}
PlotAllTurbineYearSizeDia(all.data$meta,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir,
made.by)
```
## Turbine Locations
The coutry in which the turbine was located was reported in `r NROW(na.omit(all.data$meta$Geography.country))` of the data sets.
```{r Histogram of turbine locations, echo=FALSE, fig.width = 6.5, fig.height = 3.5, fig.cap="Turbine locations by country"}
PlotAllTurbineLocations(all.data$meta,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir,
made.by)
```
Data were obtained from turbines in `r NROW(unique(na.omit(all.data$meta$Geography.country)))` countries including `r TextList(unique(na.omit(all.data$meta$Geography.country)))`.
```{r Map of turbine locations, message = FALSE, echo=FALSE, fig.width = 6.5, fig.height = 3.5}
MapAllTurbineLocations(all.data$meta,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
code.dir,
output.dir,
made.by =made.by)
```
# Results
In this section, data from all of the individual data sets have been combined.
## Errors versus wind speed
Error may reasonably be expected to be a function of wind speed. In order to maintain confidentiality, wind speeds were normalized with respect to rated wind speed (?which one?) by the PCWG tool, and errors were binned into 1-m/s bins. Results for five different methods are shown in the following plots. They include
* The baseline method
* The rotor-equivalent wind speed method (REWS)
* The turbulence correction method
* The turbulence correction method using REWS wind speeds, and
* The power deviation matrix.
### All Errors
```{r plot errors by wind speed bin using lines, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
PlotAllErrorsByWSBin_Lines(all.data$errors$by.WS,
data.range = "all",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
### Errors grouped by magnitude of Error
The following figures show the error by wind speed for data sets with baseline inner range absolute NME greater than 2%.
```{r Error by wind speed for high-NME data sets, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
# get the name of files with high baseline inner range NME
data.files.NMEanom <- unique(as.character(all.data$errors$by.Range$data.file[(all.data$errors$by.Range$range == "Inner") & (abs(all.data$errors$by.Range$error.val.pc) > 2) & (all.data$errors$by.Range$correction == "Baseline") & (all.data$errors$by.Range$error.name == "NME")]))
# plot the eror by wind speed for those files
PlotAllErrorsByWSBin_Lines(all.data$errors$by.WS[all.data$errors$by.WS$data.file %in% data.files.NMEanom,],
data.range = "all",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir,
filename.suffix = "_absGT2pc")
```
The following figures show the error by wind speed for data sets with baseline inner range absolute NME less than or equal to 2%.
```{r Error by wind speed for low-NME data sets, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
# get the name of files with low baseline inner range NME
data.files.NMEusual <- unique(as.character(all.data$errors$by.Range$data.file[(all.data$errors$by.Range$range == "Inner") & (abs(all.data$errors$by.Range$error.val.pc) <= 2) & (all.data$errors$by.Range$correction == "Baseline") & (all.data$errors$by.Range$error.name == "NME")]))
# plot the eror by wind speed for those files
PlotAllErrorsByWSBin_Lines(all.data$errors$by.WS[all.data$errors$by.WS$data.file %in% data.files.NMEusual,],
data.range = "all",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir,
filename.suffix = "_absLTEq2pc")
```
## Errors Binned by Wind Speed
The following plots show the error in each wind speed bin, grouped by correction method. Plots are provided for all data, the inner range, and the oter range. Data are plotted using a box with whiskers that shows the variation of the error in each wind speed bin. Outliers are shown as black points.
###All Data
```{r plot errors by wind speed bin using box-and-whiskers plot, echo=FALSE, fig.width = 6.5, fig.height = 3.5}
PlotAllErrorsByWSBin_BoxNWhiskers(all.data$errors$by.WS,
data.range = "all",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
###Inner Range
```{r plot all inner range errors by wind speed using box and whiskers, echo=FALSE, fig.width = 6.5, fig.height = 3.5}
PlotAllErrorsByWSBin_BoxNWhiskers(all.data$errors$by.WS,
data.range = "Inner",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
###Outer Range
```{r plot all outer range errors by wind speed using box and whiskers, echo=FALSE, fig.width = 6.5, fig.height = 3.5}
PlotAllErrorsByWSBin_BoxNWhiskers(all.data$errors$by.WS,
data.range = "Outer",
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
## Baseline Inner and Outer Range Error Histograms
The following plot compares the normalized mean error (NME) for the inner and outer range for the baseline power curve. The error is the difference between xx and xx.
The inner and outer range are defined in the PCWG's 2013 Proposal (see http://www.pcwg.org/proposals/PCWG-Inner-Outer-Range-Proposal-Dec-2013.pdf) as:
* __The Inner Range__: the range of conditions for which one can expect to achieve an Annual Energy Production (AEP) of 100% (relative to a reference power curve).
* __The Outer Range__: the range of conditions for which one can expect to achieve an AEP of less than 100%. Stated another way the outer range is the range of all possible conditions excluding those in the inner range.
Based on this definition, it would be expected that the error in the inner range would be less than the error in the outer range. This expectation is supported by the following figure.
```{r baseline inner and outer range histogram, echo = FALSE, fig.width = 6.5, fig.height = 3.5}
PlotAllBaselineErrorsByRange(all.data$errors$by.Range,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
```{r statistics, echo = TRUE}
StatsByRange(all.data$errors$by.Range,
sw.version = sw.version,
sw.version.logic = sw.version.logic)
```
## The Effect of The Turbulence Correction on the Outer Range Error
The PCWG investigated the effect of turbulence on the power curve uncertainty. The effect of turbulence was to be accounted for using the turbulence correction approach. If this approach were succesful, we would expect to see reduced error with the turbulence correction, compared to the baseline. A comparison of the outer range error with and without turbulence correction is shown in the following figure.
```{r, echo = FALSE, fig.width = 6.5, fig.height = 3.5}
PlotAllOuterErrorsTurbulenceCorrection(all.data$errors$by.Range,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
# The Effect of Other Corrections
A similar analysis as in the previous section can be applied to the improvement in both the inner and outer range. The following plot shows the difference in the error in each data set in the inner or outer range, compared to the baseline method. The 1:1 line is shown for comparison; a data point below the line has a lower normalized mean error with the correction, than it had with the baseline method. The number of data sets with each correction is also shown, together with the percentage of the data sets that showed an improvement using this correction.
```{r plot improvement, fig.width = 6.5, fig.height = 4.5, echo=FALSE}
PlotImprovementByRangeAndCorrection(all.data$errors$by.Range,
sw.version = sw.version,
sw.version.logic = sw.version.logic,
output.dir)
```
## Errors Binned by Wind Speed and Ti
The following plot shows how effective each set of corrections is, for each of four combinations of wind speed and turbulence intensity. Data are plotted as $\left|\textrm{error}\right|_\textrm{with corrections} - \left|\textrm{error}\right|_\textrm{baseline}$. Therefore, a positive change indicates an increase in the magnitude of the error, and a negative change indicates a decrease in the magnitude of the error.
```{r, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
# plot the change compared to the baseline in each data set
PlotAllChangeInErrorsBy4CM(df.in = all.data$errors$by.4CM,
show.perfect = FALSE,
sw.version = sw.version,
sw.version.logic = "from",
error.name = "NMAE",
output.dir)
```
It is also possible to define a perfect correction which reduces all error to zero. This is shown in the following plot as a benchmark correction.
```{r, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
# plot the change compared to the baseline in each data set
PlotAllChangeInErrorsBy4CM(df.in = all.data$errors$by.4CM,
show.perfect = TRUE,
sw.version = sw.version,
sw.version.logic = "from",
error.name = "NMAE",
output.dir)
```
```{r, echo=FALSE, fig.width = 6.5, fig.height = 4.5}
PlotImprovementBy4CM(all.data$errors$by.4CM,
sw.version = sw.version,
sw.version.logic = "from",
output.dir)
```