-
Notifications
You must be signed in to change notification settings - Fork 3
/
CHANGELOG
160 lines (135 loc) · 7.08 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
0.2.0:
* Added E-MTAB-386 (FFPE samples with survival data)
* put MaxMean probe mappings in eset@featureData
* put official Bioconductor platform names in eset@annotation
0.2.1:
* In the TCGA dataset, about 40 censored patients had days_to_death and days_to_recurrence labelled as NA instead of the follow-up time.
* TCGA released updated clinical information for six patients
* Levi reviewed Ina and Svitlana's double-checks, which resulted in minor changes to GSE18520 (for all patients changed debulking_status from NA -> optimal, stage IV -> III, and grade 4 -> 3)
0.2.2:
* replace GSE17260, which went missing in 0.2.1
0.2.3:
* Rename "celfile_run_date" and "batch_1" to "batch", and get rid of "batch_2" and "batch_3" so there is one consistent batch variable
* consistently label missing debulking status as "NA" rather than sometimes as "unknown"
* for FULL version, replace the missing probesets (probesets not found in the probeset-gene map were missing in previous versions)
* addition of unit test for sample counts (not actually working yet)
0.2.4:
* TCGA: replaced batch variable which disappeared
* GSE9891: corrected days_to_death and days_to_recurrence for three patients where an error occurred in copying uncurated data from supplemental PDF to create source .csv file.
0.3.0:
* Added GSE32062.GPL6480 and GSE32063 (Yoshihara et al. 2012), "Japanese datasets A and B" (issue #6).
* GSE13876: Got rid of recurrence_status, which was just a copy
of vital_status. No recurrence status is available for this study.
* resolved missing ExperimentData for GSE2109 and GSE6822 (issue #9)
* Added subtype information (serous) for PMID17290060, PMID19318476, PMID15897565 (issue #11)
* Turned Markus's dataset-check test to R CMD check unit test
0.4.0:
* Added GSE30009 (Taqman qPCR dataset).
0.4.1:
* Added ages to GSE26712 (Bonome 08).
* Use getPlatformsBiomaRt for GSE6822, to avoid using HGNC symbols directly from GPL80 [Hu6800]
0.4.2:
* 0.4.1 actually did not include ages for GSE26712, but this version
does.
0.5.0:
* disallow the use of "unknown" in template, use NA instead.
Affected subtype, debulking, primarysite, arrayedsite.
* added unit test which does syntax checking against template.
0.99.1:
* Fixed expO short title, now: expO, IGC 2005
* Changed template to make distinction between T (1-3) of the TNM
staging, and FIGO tumorstage (I-IV), which we still code as 1-4.
This affected the metadata of every dataset which has stage.
* Some datasets specified only early/late stage and high/low grade,
and some of these had been guessed (ie 2 for early, 4 for late).
Now, we DO NOT guess unspecified stage or grade, but curate these
under summarystage and summarygrade. Affected GSE12470, GSE13876,
GSE18154, and GSE26712.
* Added tumorstage based on calculation from T+M for GSE2109.
* Remove curation files for GSE3149, GSE7463 which are not in curatedOvarianData.
* Rescued some incorrectly recorded uncurated stage data in GSE6008 (1
and I both tumorstage 1).
* Updated the BLAST reference genome to GRCh37.68 (from 64). This
affects GSE12470_eset, GSE13876_eset, and GSE17260_eset.
Streamlined blast_gene_maps.R so this update can be made simply by
changing the ftp address of the genome map.
* Updated BioMart probeset <--> gene maps (Oct 9, 2012). Previous versions used
BioMart as of June 27, 2012.
* Publication metadata was missing from GSE30009 - no longer require complete cases for
MIAME in createEsets.R.
* Made test_counts.R unit test work for Normalizer and FULL versions.
* When technical replicate samples have a piece of non-agreeing sample
metadata, set the value for this sample to NA. Only affected one
grade 2 or 3 sample in GSE13876, which kept template syntax unit test
from passing.
* added batch variable to PMID17290060 (Dressman et al.)
0.99.2:
* added tumorstage and summarystage to GSE26712 from additional spreadsheet.
0.99.10: Addition of TCGA-mirna-8x15kv2 and GSE30161.
0.99.20:
* added percent_stromal_cells, percent_tumor_cells, and percent_normal_cells
* added uncurated_author_metadata
* removed T, N, M, chemo_response, inferred_chemo_response
* renamed G to grade, subtype to histological_type
* defined optimal debulking as <10mm for TCGA (was: No Macroscopic Disease)
0.99.21:
* When merging technical replicates, if a metadata field has "///" and
it disagrees between the replicates, try to merge rather than just
setting to NA.
1.0.0:
* per-dataset documentation fully automated, replaces
manually-generated documentation.
* several additions to vignette - comBat example, Table 1 of
manuscript, and a section for non-R users.
* added grade and summarygrade that was missing for Crijns (GSE13876).
* added summarygrade to Bentink (E.MTAB.386) from paper (all high).
* more metadata put in ExpressionSets: featureData(eset)@other now
contains rich platform information and warnings. This all makes it
into automatic documentation.
* Changed typo in experimentData(TCGA_eset)@name "Cancer GenomeAtlas
Network" to "Cancer Genome Atlas Network"
* Fixed featureData slot of FULLVcuratedOvarianData, which was
previously missing gene symbols.
1.0.1:
* curate newly available grade for GSE17260 (Yoshihara 2010)
1.1.0:
* add p-values for forest plots to the figure captions.
* change curatedOvarianData to NormalizerV, FULLV for the alternative
packages in inst/extdata/*
* For GSE18520 (Mok), change debulking from all "optimal" to all "NA".
* Added GSE8842: all early-stage samples on a custom/commercial
Agilent two-color cDNA array.
* Added citation to paper published in DATABASE, for citation("curatedOvarianData")
1.1.1:
* added options to createEsets.R: remove.retracted and remove.subsets
1.2.0:
* added RNA-seq data for TCGA, updated gene maps.
1.2.1:
* TCGA sample type "11" samples now correctly curated as "healthy"
instead of "adjacentnormal"
1.2.2:
* added section "Non-unique gene symbols" to the vignette.
* updates to createEsetList.R to catch it up to the PMID24700801 version (www.bitbucket.org/lwaldron/ovrc4_sigvalidation) - mainly option to manually remove certain studies or patients
* inst/extdata/patientselection.config by default now removes the same duplicate samples as PMID24700801.
1.3.0:
* added several new datasets (GSE26193, GSE49997, GSE44104, GSE51088)
* added debulking status to Mok (GSE18520)
* curated LMP and borderline tumors as "borderline" and benign tumors
as "benign" in sample_type. Affected GSE2109, GSE30009, GSE49997,
GSE51088, GSE6822, GSE8842, GSE9891.
* improved specificity of platform_accession, e.g. GSE20565_eset is
GPL570 only, not GPL570|GPL2005|GPL6801 (change made in
etc/ovarian_cancer_public_datasets.csv)
* Renamed technology_type column to platform_technology in
etc/ovarian_cancer_public_datasets.csv, so this makes into
experimentData(eset).
* Upgrade from Ensembl release 68 to release 76 for blast gene maps,
re-create Biomart gene maps on Aug 25 2014
1.3.1:
* added missing grade to GSE26913
* createEsetList.R now does scaling and variance filtering of genes
before removing any samples from the ExpressionSet.
1.3.2:
* Update all gene maps
1.5.2:
* Fix library(logging) bug and remove.samples bug in createEsets.R.