Analysis_Hdf5_tracking_CD_v3_5.Rmd

author: Steven Wink, Gerhard Burger

changelog

version 2_3: tracking data reorganization (unique parents and linking broken tracks complete)
reformat script h5 file CellProfiler
version 3_3: ?
version 3_4: ?
version 3_5: GB: start splitting up code into managable parts (see also H5CellProfiler.R) 
-----------


For a ~1 GB hdf5 file you will require ~1 GB of free RAM: in practice you might want to have 8 GB of RAM to run this script
When multiple objects per parent object exist computation time will take several factors longer ( computation time of this matching part is around 30 - 60 seconds), this cannot be helped (although maybe optimized some) since the algorithm has to look in each image if multiple parents exist for each non-parent object.

install latest version of R
install RStudio

install knitr
set RStudio to use knitr
install packages:
  source("http://bioconductor.org/biocLite.R")
  biocLite(c("knitr", "rhdf5", "bit64", "stringr")

create csv file: see example f/media/Image Data/POC/2012_10_04/TIFs/output/Bipile "2012-12-06 Pipette sheet cyc1 srxn1 actn1 prpf40a.csv"   - create this file using original spreadsheet layout, copy formulas to other experiment layout files.

set working directory to location of csv file and h5 files


First run r-script (wellFolders2013__05_01.Rmd) to place all ND2 exported Tiffs in to well-named folders (A01_1, A01_2, A02_1, A02_2, ...) 
Then run CellProfiler (trunk build with hdf5 implementation) using the folder names as metadata

 
 format for csv file: anything but include the same date identifier as for your h5 file:  yyyy_mm_dd  this is so you can't by accident switch layouts/ data
format for hdf5 file: anythinghere_yyyy_mm_dd.h5
If you split your data in 2 sets make sure to append _set1.h5 and _set2.h5  


CellProfiler setting requirements
 1) include metadata in LoadImages module and define the location as a single parameter as group
 2) include metadata location as well_# ( from well script ) 
 3) Always include relateobjects if you have multiple objects: larger object is parent, smaller object inside parent is the secondary object (it 's child). Each object must have a parent child relationship, there is always 1 parent - there can be multiple children.
 

eenmalig packages installeren die je niet hebt ( library( blah) werkt dan niet)

source("http://bioconductor.org/biocLite.R")
biocLite("packageHier")

Run each block seperately

BLOCK  1: extracting Hdf5 data
BLOCK  2: plotting single cell population based data
BLOCK  3: Organizes single cell tracking analysis
BLOCK  4: plotting single cell tracking data + measurement

All data is organized via the tracked object, so multiple objects per tracked object will be summarized by the mean or by a chosen quantile (will reuire a small code change since the quantile is very slow to perform in certain cases)

TODO
1) integrate plateID in script using time course data
  - duplicate plot scripts and modify 1 with plateID (plotting changes)
  - duplicate summary statistic scripts and modify 1 with plateID
2) modify script for single time point data (generate a fake time vector with 1's)

3) Nfkb translocatie parameters 


# TODO: single time points, also for siRNA screens: different graphs etc. Maybe just add siRNA script into block-1, and run some of those plots and some of block-2 plots based on the single time point image data.
BLOCK 1: Extracting Hdf5 data


# TODO: many matched controls, negative and positive. e.g. compound1+mock is matched control of compound1+siRNA1, or different solvents


```{r generateCSVfiles}
getwd()
#"D:/DILI screen/2013-12-11"
#run fixTrackingFun
#modify myDFo met output
#fix alle plotjes
#distribueer
#analyzeer sylvia data

options(stringsAsFactors = FALSE)
#setwd("G:/Endpoint assay/2014-10-30_R002/Output XBP1")

dir()
#load library with methods for reading xls files
rm(list=ls())
cp.pipeline.location <- 'D:/h5Cellprofiler/H5CellProfiler'
source(file.path(cp.pipeline.location, "mainFunction.R"), chdir = TRUE)
source(file.path(cp.pipeline.location, "fixTrackingFun.R"))
library(rhdf5)
library(stringr)
library(plyr)
library(data.table)
library(doParallel)
library(ggplot2)
library(reshape2)
library(grid)
library(shiny)
library(ggvis)

source(file.path(cp.pipeline.location, "theme_sharp.R"))
source(file.path(cp.pipeline.location, "countCellFun.R"))

#=================user defined variables=================================================================
#========================================================================================================
#========================================================================================================
#========================================================================================================
#========================================================================================================
#========================================================================================================
#====================================,====================================================================
#global variables:
dir()

# Each hdf5 file contains 1 or multiple plate-based data (i.e. do not divide data from 1 plate in multiple h5 files unless different time points)
hdf5FileNameL <- c("2014_05_23_Hs578T RCM.h5") # mainFunction will loop through this vector, at the end rbinding the individual outputs. 

# each hdf5 gets it's own metadata info. Either a path WITH the "Image/" character or a manualy defined string WITHOUT the "/" character
# If each h5 file has identical metadata 1 entry is sufficient. Else provide entry for each h5 file

# these metadata variables should be defined in metadata layout file if they vary within the h5 file, if not needed in the metadata file (so metadata is provided in these variables) - then put NA's in the metadata file
locationID <- c("Image/Metadata_Loc")  # well/ location metadata
plateID <- c( "20140523Hs578T_RCM") # PlateID must always be provided (either manual or h5 paths)- here AND in layout file: becuase plateID is used to couple metadata plate layout file.
imageID <- c("Image/Metadata_xy") # Image/Metadata_  ... image number (obtained from image file name ) 
timeID <- c("Image/Metadata_tp") # timeID, either hdf5 path, or vector of numbers according to hdf5 files (each hdf5 is then a time point) So capture time point in h5 file if needed. Or defined in metadata layout file
replID <- c("") # replicate ID (are the plate replicates of each other? (just easy for plotting options))
exposureDelay <- c("00:00") # hh:mm 
timeBetweenFrames <- c("00:06:00") # hh:mm:ss 
#define the paths of the measurements you are interested in, leave empty if you dont need so many: full full full empty empty empty.....  NOT: full empty full.....
# this is for measurements: tracking will be handled automatically

# only object related data or image related data( e.g. not implemented yet for Relationship/ Experiment related data )
# only add the object/feature part for example:  "myObject/Inensity_MeanIntensity_img"

myFeaturePathsA <- list(  #do not define displacement and parent object  - these are automatically included
firstFeature  = "Filteredcell/AreaShape_MedianRadius",  # enter a objectpath inside .h5 file to measurement ( there will be one  /  meaning--> object/measurement)
secondFeature = "Image/Count_Filteredcell",
thirdFeature  = "Filteredcell/TrackObjects_Displacement_20",
fourthFeature = "Filteredcell/AreaShape_Area",
fifthFeature  = "",
sixthFeature  = "",
seventhFeature= "",
eighthFeature = "",
ninethFeature = "",
thenthFeature = "" ) 
  # tab delimted text file with metadata headers:    well  treatment  dose_uM	control	cell_line 
#the control is 1 or `1 where `1 is a control (just used for some extra coloring in plots)
plateMDFileName <- "2014_05_23_Hs578T RCM.txt"
dir()
parentObject <- "Filteredcell" # the name in hdf5 file of the parent ( as defined in relate objects module in CellProfiler ) if no parent (only 1 object defined in CP for example: then enter the object here)

childObject1 <- "" # First child of parentObject (if a child object was tracked - define this as your tracked object)
childObject2 <- ""
childObject3 <- ""
childObject4 <- ""
childObject5 <- ""
tertiaryObject <- "" #  child of parentObject and childObject1 object. Defined (in CP) by substraction of larger object minus smaller object

# what summary statistic do you prefer to display the multiple objects per parent object with? This is NOT performed for nuclei, but for children objects like foci this can be usefull
multiplePerParentFunction <- function(x) { mean(x, na.rm = TRUE) }  # or  function(x) { quantile(x, 0.8, na.rm = TRUE) }

oscillation <- FALSE  # TRUE / FALSE  - will extract oscillation related parameters of divisionOne (TRUE is not implemented yet)

writeSingleCellDataPerWell <- FALSE # write all single cell data in seperate file per well, takes time
writeAllSingleCellData<- FALSE  # Only needed of you need the txt file yourselfl writes all single cell data in single txt file.
numberCores <- min(4, detectCores()) 
dir()


#do the same for set2 if exists, then rbind the results with myDF
outputList= list()
# run main function: this could be made parralel but so far speed has not been an issue for this function
if(length(hdf5FileNameL) > 1) {
registerDoParallel(min(numberCores, length(hdf5FileNameL)))

outputList<- foreach( h5loop = seq_along(hdf5FileNameL ),
                      .packages = c("rhdf5", "stringr", "data.table", "plyr")) %dopar%
  {

     mainFunction( h5loop=h5loop,
     hdf5FileNameL=hdf5FileNameL,locationID=locationID, timeID=timeID, plateID=plateID,
     imageID=imageID, replID = replID, 
     myFeaturePathsA=myFeaturePathsA, plateMDFileName=plateMDFileName,
     parentObject=parentObject, childObject1=childObject1, childObject2=childObject2, 
     childObject3=childObject3, childObject4=childObject4, childObject5=childObject5, 
     tertiaryObject=tertiaryObject, multiplePerParentFunction=multiplePerParentFunction,
     oscillation=oscillation, 
     writeSingleCellDataPerWell=writeSingleCellDataPerWell, 
     writeAllSingleCellData=writeAllSingleCellData,
     timeBetweenFrames=timeBetweenFrames, exposureDelay=exposureDelay,
     numberCores = numberCores
     )
   }
} else {
  h5loop <- 1
    outputList <- mainFunction( h5loop=h5loop,
     hdf5FileNameL=hdf5FileNameL,locationID=locationID, timeID=timeID, plateID=plateID,
     imageID=imageID, replID = replID, 
     myFeaturePathsA=myFeaturePathsA, plateMDFileName=plateMDFileName,
     parentObject=parentObject, childObject1=childObject1, childObject2=childObject2, 
     childObject3=childObject3, childObject4=childObject4, childObject5=childObject5, 
     tertiaryObject=tertiaryObject, multiplePerParentFunction=multiplePerParentFunction,
     oscillation=oscillation, 
     writeSingleCellDataPerWell=writeSingleCellDataPerWell, 
     writeAllSingleCellData=writeAllSingleCellData,
     timeBetweenFrames=timeBetweenFrames, exposureDelay =exposureDelay,
     numberCores = numberCores
     )
}
  
  
save(outputList, file = 'outputList.Rdata')

#load("outputList.Rdata")

if(length(unlist(lapply( lapply(outputList, names),str_match_all, "myDT") )) > 1) {
  outputListmyDT<- lapply(outputList, "[[", "myDT")
      testColN<- lapply(outputListmyDT, function(x) {(  (names(x)))} )
      all.identical <- function(x) all(mapply(identical, x[1], x[-1]))
      if(!all.identical(testColN))
        {
        myDFo <- do.call('rbind', outputListmyDT)
        
        } else{
        myDFo <- rbindlist(outputListmyDT)  
                  
        }
outputListsumData <- lapply(outputList, "[[", "sumData")

sumData <- rbindlist(outputListsumData)

 kMyVars <- outputList[length(outputList)][[1]]
  kMyVars$myDT <- NULL
} else {
  outputListmyDT <- outputList$myDT
  myDFo <- outputListmyDT
  kMyVars <- outputList[-1]
  sumData<- outputList$sumData

}


  kColNames <- kMyVars$kColNames
  dataFileName <- gsub(".txt", "",kMyVars$plateMDFileName)


myFeatures <- gsub("/", "_", 
                     gsub("^(Measurements/[0-9]{4}(-[0-9]{2}){5}/)", "", kMyVars$myFeaturePathsA)
                     )
numberCores <- kMyVars$numberCores


runApp(file.path(cp.pipeline.location, 'time plots'), launch.browser = TRUE )


```


BLOCK 2
summary statistics
=============


# ff summary data bar plot
```{r }

piDataL <- melt(piData, measure.vars = c("binaryOne_pi_obj_maskedAreaShape_Area.DIV.NucleiAreaShape_Area", "imageCountParentObj"))

piDataL <- piDataL[ order( piDataL[ , "treatment"], piDataL[ , "dose_uM"]),   ]
  counts.d <- by(data = piDataL, INDICES = piDataL[, "treatment"], function(x) d.levels = unique(x[, "dose_uM"]))
if(sum(lapply(counts.d, length ) >1  ) > 0 ) {  # if any compound has more than 1 dose level:
  counts.d.l <- sapply(counts.d, as.list)
  counts.d.l <- melt(counts.d.l,  length)
  old.nrow <- nrow(piDataL)
  piDataL <- merge(piDataL, counts.d.l, by.x = c( "treatment", "dose_uM"), by.y = c( "L1", "value"), sort = FALSE)
    if(nrow(piDataL) != old.nrow){
      stop("setting dose levels for density plots failed")
    }
  piDataL$L2 <- factor(piDataL$L2)
  colnames(piDataL)[ colnames(piDataL) == "L2"] <- "doseLevel"

}
head(piDataL)

toOrder <- piDataL[ piDataL$variable == "binaryOne_pi_obj_maskedAreaShape_Area.DIV.NucleiAreaShape_Area",]

indO <- ddply(toOrder, .(treatment), summarize, meanV =  mean(value))
head(indO)
indOrder <- order(indO$meanV)
compOrder <- indO$treatment[indOrder]
piDataL$treatment <- factor(piDataL$treatment, levels = compOrder, order = TRUE)


p <- ggplot(data = piDataL, aes(x = treatment , y = value, fill = doseLevel, group = plateID, color=plateID))  + geom_bar(stat = "identity", position = "dodge") 
p <- p +  theme( axis.text.x = element_text(angle = 90, hjust = 1, vjust =0.4, size = 12 + round(400/ nrow(piDataL), 0), 
                                           colour = "grey50") ) + theme( strip.text.x = element_text( )) +
  ggtitle("MV_AUC") + 
  theme(plot.title = element_text(lineheight=.8, size = 14 ))

dodge <- position_dodge(width=0.9)

p <- p + facet_wrap( ~variable , ncol=1, scales = "free_y" )  

p
lapply(piDataL, class)
piDataL <- piDataL[!indr,]
indr<-(piDataL$value>7000)
```


Run Block1 first. 
BLOCK 2: summary statistics & plotting
```{r summaryStatistics}
rm(list=ls()[ls() != "cp.pipeline.location"])
options(StringsAsFactors = FALSE)
library(rhdf5)
library(ggplot2)
library(reshape2)
library(stringr)
library(plyr)
library(pracma)
library(grid)
library(doParallel)
library(data.table)
library(shiny)
library(ggvis)

source(file.path(cp.pipeline.location, "theme_sharp.R"))
source(file.path(cp.pipeline.location, "countCellFun.R"))

# use as p + theme_sharp()
# setwd() 
dir()

##========================== user defined variables=================


skip.locations <- c( )  # c("plate_1_B02_1", "plate_2_B02_1" )
summaryStatFunction <- function(x) { mean(x, na.rm = TRUE) } # function(x) { mean(x, na.rm = TRUE) }  or function(x) { quantile(x, 0.8, na.rm = TRUE) } (you can choose which quantile - here it is set to 0.8)
errorType <- "sd"   #"sd"  or  "cl95"   the cl95 is two sided 95% confidence interval. sd is standard error, half above and half under the average
makeQuantilePlots <-TRUE # takes very long, do you need these ?
writePDFs <- TRUE # write plots as pdf? Takes time - choose false to only write RData files (plots are generated with ggplot2)
densityPlots <- TRUE
whichQuantiles <-c(0.05, 0.25, 0.5, 0.75, 0.95) # for large datasets; > 1GB hfd5 file, choose max 4 quantiles. Choose any quantiles. extra plots per feature are created displaying these quantiles of single cell data

##########=========================================##############
## ========================== end user defined variables============
##########=========================================##############
##########=========================================##############

dir()

# make seperate functions which are run in this rmd file depending on what user wants
# define global variables here (environment in which functions are called)
load("outputList.Rdata")
if(!exists("outputList"))
  {
  stop("No outputList.Rdata file found")
  }

outputListmyDT<- lapply(outputList, "[[", "myDT")
      testColN<- lapply(outputListmyDT, function(x) {(  (names(x)))} )
      all.identical <- function(x) all(mapply(identical, x[1], x[-1]))
      if(!all.identical(testColN))
        {
        stop("outputlist does have tables with identical column names/ 
             object names, manually rbind the outputlist")
        }
myDFo <- rbindlist(outputListmyDT)

  kMyVars <- outputList[length(outputList)][[1]]
  kMyVars$myDT <- NULL
  kColNames <- kMyVars$kColNames
  dataFileName <- gsub(".txt", "",kMyVars$plateMDFileName)

print(paste("Plot processing ",dataFileName))
Sys.sleep(1)
myFeatures <- gsub("/", "_", 
                     gsub("^(Measurements/[0-9]{4}(-[0-9]{2}){5}/)", "", kMyVars$myFeaturePathsA)
                     )
  numberCores <- kMyVars$numberCores

setkey(myDFo, plateWellID)
  if(!is.null(skip.locations)) 
    {
    myDFo<-myDFo[!skip.locations] # by assignment (memmove in c) is not possible by row in data.table (yet)
    }

  uniqueLocations <- unique(myDFo [ , plateWellID])
  

  # write count parent cell file & return split data.tables for par. comp.
print("Counting parent objects and splitting data for MC processing") 
splitDataL <- countCellFun( kColNames )
 

summaryStatsDir <- paste(dataFileName, "summaryStats", sep = "_")
if ( !file.exists( summaryStatsDir )) 
                    {
                    dir.create( summaryStatsDir )
                    }

#

# time series plot of average of cell population

runApp(file.path(cp.pipeline.location, 'time plots'), launch.browser = TRUE )

# TODO : 11 aug 2014: bouw alles van plotten block 2 in in shiny applicatie
# zorg dat het wel met meer checks komt zodat de app niet zomaar kan vastlopen.


hieronder oude code nog niet met data.table/ goed memory gebruik van parallen & ook niet in app geintegreerd
?dataTableOutput

print(object.size(x=myDFo),units = "Mb")


nrow(myDFo)

?object.size
if (divisionOne[[1]] != FALSE & divisionTwo[[2]] != FALSE & binaryOne == FALSE & binaryTwo == FALSE){  # both division are not false and both binary are false
 
   myDataL <- melt(myDFo, measure.vars=c(myFeatures,"Displacement", "imageCountTracked", 
                                                          paste(divisionOne[[1]], divisionOne[[2]], sep ="_"),
                                                          paste(divisionTwo[[1]], divisionTwo[[2]], sep ="_")))
       myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID, "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value))
  
  } else if (divisionOne[[1]] != FALSE & divisionTwo[[2]] == FALSE & binaryOne == FALSE )  # only first division is not false and it's binary is false
   {
     myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked", 
                                                          paste(divisionOne[[1]], divisionOne[[2]], sep ="_")
                                                          ))
     myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value))
     
      } else if (divisionOne[[1]] != FALSE & divisionTwo[[1]] == FALSE & binaryOne !=FALSE )  # only first division is not false and it's binary is not FALSE
   {
     
        myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked", 
                                                          
                                                          paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_")
                                                          ))
     
         
     myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value), sum = sum(value, na.rm= TRUE))
    
      myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"), "summaryStat" ] <-
            
                                     myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"), "sum" ]
        
        myData_ss$sum <- NULL
 
 
 } else if ( divisionOne[[1]] != FALSE & divisionTwo[[1]] != FALSE & binaryOne == FALSE & binaryTwo != FALSE ) # both division are not false and only first binary is false
        {
        myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked", 
                                                          paste(divisionOne[[1]], divisionOne[[2]], sep ="_"),
                                               paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_")
                                                      ))
        
              
        myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value), sum = sum(value, na.rm= TRUE))
        
        myData_ss[ myData_ss$variable == paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_"), "summaryStat" ] <-
            
                                     myData_ss[ myData_ss$variable == paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_"), "sum" ]
        
        myData_ss$sum <- NULL
        
      
        } else if ( divisionOne[[1]] != FALSE & divisionTwo[[2]] != FALSE & binaryOne != FALSE & binaryTwo == FALSE ) # both division are not false and only second binary is false
        {
          myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked", 
                                                          paste(divisionTwo[[1]], divisionTwo[[2]], sep ="_"),
                                               paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_")
                                                      ))
        
       
        myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value), sum = sum(value, na.rm= TRUE))
        
        myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"), "summaryStat" ] <-
            
                                     myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"), "sum" ]
        
        myData_ss$sum <- NULL
      
       
        } else if ( divisionOne[[1]] == FALSE & divisionTwo[[2]] == FALSE) # no column divisions defined
          {
         myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked"))
         
                 
         myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value))
         
         
         } else if (divisionOne[[1]] != FALSE & divisionTwo[[2]] != FALSE & binaryOne != FALSE & binaryTwo != FALSE) # both division columns used for cell death data
           {
        myDataL <- melt(myData, measure.vars=c(myFeatures,"Displacement", "imageCountTracked",paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"),
                                                          paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_")))
        
                
        myData_ss <- ddply( myDataL, c("treatment", "dose_uM", locationID , "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, summaryStat = summaryStatFunction(value), sum = sum(value, na.rm= TRUE))
        
        myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_") | 
                     myData_ss$variable == paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_"), "summaryStat" ] <-
            
                                     myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_") | 
                     myData_ss$variable == paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_"), "sum" ]
        
      myData_ss$sum <- NULL
        
      } else {
          stop("Nothing found for melting division columns")
        }
# alphabetically ordered by treatment 

uTreatments <- unique(myData$treatment)
uTreatments <- uTreatments[order(uTreatments)]
rm(myData)


#add normalized binary data

if (divisionOne[1] !=FALSE & binaryOne != FALSE){
  buffer <- myData_ss[ myData_ss$variable == paste("binaryOne",divisionOne[[1]], divisionOne[[2]], sep ="_"), ]
  bufferCountCells <- myData_ss[ myData_ss$variable == "imageCountTracked", ]
  buffer$summaryStat <- buffer$summaryStat / bufferCountCells$summaryStat
  buffer$variable <- "binaryOneFraction"
myData_ss <- rbind(myData_ss, buffer)  
}


if (divisionTwo[1]!=FALSE & binaryTwo != FALSE){
  buffer <- myData_ss[ myData_ss$variable == paste("binaryTwo",divisionTwo[[1]], divisionTwo[[2]], sep ="_"), ]
  bufferCountCells <- myData_ss[ myData_ss$variable == "imageCountTracked", ]
  buffer$summaryStat <- buffer$summaryStat / bufferCountCells$summaryStat
  buffer$variable <- "binaryTwoFraction"
myData_ss <- rbind(myData_ss, buffer)
  }

# if divided by zero; remove inf values:
myData_ss$summaryStat[ is.infinite(myData_ss$summaryStat) ] <- NA

myData_ss$variable <- factor(myData_ss$variable)

#calculate sd' s over multiple locations/ wells for each condition
myData_ss_w <-myData_ss

myData_ss <- ddply( myData_ss, c("treatment", "dose_uM",  "timeAfterExposure", "control", "cell_line", "variable"),
                         summarize, meanSummaryStat = summaryStatFunction(summaryStat), sd = sd(summaryStat, na.rm = TRUE), 
                    n = length(summaryStat))


# remove first cell speed point as this is not really zero


min.t <- min(as.numeric(as.character(myData_ss$timeAfterExposure)))
myData_ss$meanSummaryStat[myData_ss$variable == "Displacement" & myData_ss$timeAfterExposure == min.t] <- NA


#error <- qt(0.975,df=n-1)*s/sqrt(n)
suppressWarnings(myData_ss$error95 <- qt(0.975, df = myData_ss$n - 1) * myData_ss$sd / sqrt(myData_ss$n))


objetSize <- object.size(myDataL)
if(makeQuantilePlots){
  # increase speed by using parallel processing and aggregate instead of ddply 
    if( objetSize > 800000000 & objetSize %/% 800000000 < 2 & length(whichQuantiles) > 4 ) {
      print("Large dataset: corenumber is set to 3 for calculating quantiles to avoid memory usage > 32GB")
      registerDoParallel(cores=3)
      } else if( objetSize %/% 800000000 >= 2 ) {
        registerDoParallel(cores=1)
        print("setting multicore to 1 for quantile calculations")
      } else {
      registerDoParallel(cores=length(whichQuantiles))
      }

    myData_q <- foreach( qCounter = seq_along(whichQuantiles)) %dopar% {
              aggregate( value ~ treatment + dose_uM + timeAfterExposure + control + cell_line + variable,
              data = myDataL, quantile, whichQuantiles[qCounter], na.rm=TRUE)
    }
 
 for (addQ in seq_along(myData_q)){
    myData_q[[addQ]]$quantile <- whichQuantiles[addQ]
  }
    myData_ssq <- rbind.fill(myData_q)
    rm(myData_q)
    
write.table(file = paste(str_match(dataFileName, '([0-9]{4}[- _]{1}[0-9]{2}[- _]{1}[0-9]{2})')[1], "myData_summarized_quantiles.txt"), 
            myData_ssq[order(myData_ssq$cell_line, myData_ssq$variable, myData_ssq$treatment, myData_ssq$dose_uM, 
                            myData_ssq$quantile, myData_ssq$timeAfterExposure),], sep = "\t", col.names = NA)
}      

p.size <-  6 + round(0.1 *length(unique(paste(myData_ss$treatment,myData_ss$dose_uM))),0)
subplot.char.l <- max(nchar(paste(unique(myData_ss$dose_uM), unique(myData_ss$treatment)) ))
max.t <- max(as.numeric(myData_ss$timeAfterExposure))
cell_lines <- unique(myData_ss$cell_line)
allVars <- as.character(unique(myData_ss$variable))
all.treatments<-as.character(unique(myData_ss$treatment))
#density plot
if(!file.exists(paste(summaryStatsDir, "/densityPlots", sep = ''))){
  dir.create(paste(summaryStatsDir, "/densityPlots", sep = ''))
}
if(!file.exists(paste(summaryStatsDir, "/densityPlots/RDataFiles", sep = '')))
  {
  dir.create(paste(summaryStatsDir, "/densityPlots/RDataFiles", sep = ''))
  }

singleCellMeas <- unique(myDataL$variable)

if(densityPlots){
        densityPlotFun <- function(oneFeatallTrackDF) {
          if(writePDFs) {
              pdf(file = paste(summaryStatsDir, "/densityPlots/densityPlot", (singleCellMeas[ singleCellMeas != "imageCountTracked"])[kk],".pdf", sep =""),
                  height = 6+p.size,width = 10+1.2*p.size)
              }
           for (ii in seq_along(cell_lines)){
            oneFeatallTrackDF_c<- oneFeatallTrackDF[ oneFeatallTrackDF$cell_line == cell_lines[ii], ]
      
          p <- ggplot(oneFeatallTrackDF_c,  aes(value, color = doseLevel )) + geom_density(na.rm=TRUE) + facet_wrap(~treatment) +
          theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/max.t,0), colour = "grey50") ) + 
          theme( strip.text.x = element_text( size = 10)) +
          ggtitle( paste( "density ",(singleCellMeas[ singleCellMeas != "imageCountTracked"])[kk], "_", cell_lines[ii], sep ="" ) ) + 
          theme(plot.title = element_text(lineheight=.8, size = 12 ))   + theme_sharp()
        save(p, file = paste(summaryStatsDir, "/densityPlots/RdataFiles/densityPlot", (singleCellMeas[ singleCellMeas != "imageCountTracked"])[kk], cell_lines[ii],".RData", sep =""))
  if(writePDFs){
  suppressWarnings(print(p))  
  }
} # ii loop cell_lines
if(writePDFs){  
dev.off()
}
} #densityPlotFun

# create dose levels -> assign color to dose levels
print("Setting dose levels for density plots:")
  
  myDataL <- myDataL[ order( myDataL[ , "treatment"], myDataL[ , "dose_uM"]),   ]
  counts.d <- by(data = myDataL, INDICES = myDataL[, "treatment"], function(x) d.levels = unique(x[, "dose_uM"]))
if(sum(lapply(counts.d, length ) >1  ) > 0 ) {  # if any compound has more than 1 dose level:
  counts.d.l <- sapply(counts.d, as.list)
  counts.d.l <- melt(counts.d.l,  length)
  old.nrow <- nrow(myDataL)
  myDataL <- merge(myDataL, counts.d.l, by.x = c( "treatment", "dose_uM"), by.y = c( "L1", "value"), sort = FALSE)
    if(nrow(myDataL) != old.nrow){
      stop("setting dose levels for density plots failed")
    }
  myDataL$L2 <- factor(myDataL$L2)
  colnames(myDataL)[ colnames(myDataL) == "L2"] <- "doseLevel"
} else {
  myDataL$doseLevel <- 1
  }# end if multiple dose
  cores <- min(c(length(singleCellMeas[ singleCellMeas != "imageCountTracked"]), numberCores))
      registerDoParallel(cores=cores)
      print("printing density plots:") 
      foreach(kk = seq_along(singleCellMeas[ singleCellMeas != "imageCountTracked"]), 
              .packages = c('ggplot2', 'grid', 'plyr'), .export = 'theme_sharp') %dopar% 
      {
        densityPlotFun(subset(myDataL,myDataL$variable == singleCellMeas[ singleCellMeas != "imageCountTracked"][kk]))
      }
} # if densityPlots
checkDoseLevel <- unique(myDataL[, c("dose_uM", "treatment", "doseLevel")])
checkDoseLevel<- checkDoseLevel[ order(checkDoseLevel$treatment), ]
write.table(file= paste(summaryStatsDir, "/checkDoseLevels.txt", sep = ''), sep = "\t", checkDoseLevel)
rm( myDataL,checkDoseLevel )
#need to scale the values for plotting all parameters in 1 plot


# decided to scale by average division.
# alternative would be e.g. rescaling with (x-min(x)) / (max(x) - min(x))

scaleF <- ddply(myData_ss, .(variable), summarize, mean = mean(meanSummaryStat, na.rm=TRUE))
myData_ss_scaled = myData_ss
for (scaleCount in 1:nrow(scaleF)) {
 
  myData_ss_scaled[myData_ss$variable == scaleF$variable[scaleCount],"meanSummaryStat"]  <- 
    myData_ss[myData_ss$variable == scaleF$variable[scaleCount],"meanSummaryStat"]  /   scaleF$mean[scaleCount] 
 
   }

myData_ss$dose_uM <- factor(myData_ss$dose_uM, order = T)
myData_ss$timeAfterExposure <- factor(myData_ss$timeAfterExposure, order = T)

myData_ss_scaled$dose_uM <- factor(myData_ss$dose_uM, order = T)
myData_ss_scaled$timeAfterExposure <- factor(myData_ss$timeAfterExposure, order = T)
myData_ss_scaled$sd <- NULL
myData_ss_scaled$error95 <- NULL
myData_ss_scaled$n <- NULL

write.table(file = paste(str_match(dataFileName, '([0-9]{4}[- _]{1}[0-9]{2}[- _]{1}[0-9]{2})')[1], "myData_summarized.txt"), 
            myData_ss[order(myData_ss$cell_line, myData_ss$variable, myData_ss$treatment, myData_ss$dose_uM, 
                            myData_ss$timeAfterExposure),], sep = "\t", col.names = NA)

write.table(file = paste(str_match(dataFileName, '([0-9]{4}[- _]{1}[0-9]{2}[- _]{1}[0-9]{2})')[1], "myData_summarized_scaled.txt"), 
            myData_ss_scaled[order(myData_ss_scaled$cell_line, myData_ss_scaled$variable, myData_ss_scaled$treatment, myData_ss_scaled$dose_uM, 
                            myData_ss_scaled$timeAfterExposure),], sep = "\t", col.names = NA)

# bar plots of all conditions, using AUC with function trapz from pracma package

barData <-ddply(myData_ss_w[!is.na(myData_ss_w$summaryStat),], c("treatment", "dose_uM", locationID, "cell_line", "control", "variable"), summarize,
  AUC = trapz(as.numeric(timeAfterExposure), as.numeric(summaryStat )) )


barData <-ddply(barData, c("treatment", "dose_uM",  "cell_line", "control", "variable"), summarize,
  meanAUC = mean(AUC, na.rm = TRUE), sd = sd(AUC, na.rm = TRUE), n =  length(AUC))

suppressWarnings(barData$error95 <- qt(0.975, df = barData$n - 1) * barData$sd / sqrt(barData$n))


#test what happens with trapz if multiple graphs 
# tested and verified it works as intended


#barData$dose_uM <-factor(barData$dose_uM, levels = unique(barData$dose_uM)[order(unique(barData$dose_uM))],order = TRUE)

barData<-barData[order(barData$cell_line, barData$variable, barData$treatment, barData$dose_uM ),]

# create levels within each treatment for dose
counts.d <- ddply(barData, .(treatment, cell_line, variable), summarize,count.d.l = length(dose_uM))
barData$dose.f <- NA
for (i in 1 : nrow(counts.d))
  {

  barData$dose.f[ barData$treatment == counts.d$treatment[i] & 
                      barData$cell_line == counts.d$cell_line[i] ] <- gl(counts.d$count.d.l[i], 1)

  }

barData$dose.f<-factor(barData$dose.f)

#order the treatments based on average of myFeature 

mean.c.l <- ddply(subset(barData, variable == gsub("/", "_",myFeature)), .(treatment), summarize, mean.t = mean(meanAUC))

barData$treatment <- factor((barData$treatment), levels =  unique(mean.c.l$treatment)[order(mean.c.l$mean.t)], order = T  )

write.table(file = paste(str_match(dataFileName, '([0-9]{4}[- _]{1}[0-9]{2}[- _]{1}[0-9]{2})')[1], "myData_AUC.txt"), 
            barData, sep = "\t", col.names = NA)


# make name
#function(x) { quantile(x, 0.3, na.rm = TRUE)}
#function(x) { mean(x, na.rm = TRUE) }
summName <- gsub("function \\(x\\) \\{    ", "",paste(deparse(summaryStatFunction), collapse=""))
summName <- gsub( "\\(x" , "", summName)
summName <- gsub( ", na.rm = TRUE\\)\\}", "", summName)

if (!file.exists(paste(summaryStatsDir, "/", "dose", sep  =""))){
dir.create(paste(summaryStatsDir, "/", "dose", sep  =""))
}


if (errorType == "sd") {
limits <- aes(ymax = meanAUC + 0.5*sd, ymin = meanAUC - 0.5*sd)
  } else if ( errorType == "cl95")
    {
    limits <- aes(ymax = meanAUC + error975, ymin = meanAUC - error975)
    } else 
      {
        stop("errorType either \"sd\" or \"cl95\"")
      }

p <- ggplot(data = barData, aes(x = treatment , y = meanAUC, fill = dose.f))  + geom_bar(stat = "identity", position = "dodge") 
p <- p +  theme( axis.text.x = element_text(angle = 90, hjust = 1, vjust =0.4, size = 12 + round(400/ nrow(barData), 0), 
                                           colour = "grey50") ) + theme( strip.text.x = element_text( )) +
  ggtitle( paste(  "MV_AUC", gsub( ".h5",""  , hdf5FileName ), summName ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 14 ))

dodge <- position_dodge(width=0.9)

p <- p + facet_wrap( variable~cell_line ,ncol = 1, scales = "free_y" ) + 
  geom_errorbar(limits, width = 0.05, position = dodge) 

m.height <- (6 + round(length(barData$treatment)/40))
m.width <-  (6 + round(length(barData$treatment)/40))

if (writePDFs){
pdf( file = paste( summaryStatsDir, "/", "MV_barplot", "AUC", gsub( ".h5",""  , hdf5FileName ),".pdf", sep ="" ), height = m.height, width = m.width )
print(p)
dev.off()
}
save(p, file = paste( summaryStatsDir, "/", "MV_barplot", "AUC", gsub( ".h5",""  , hdf5FileName ),".RData", sep ="" ))


# plot time curves for each compound dose combination


allVarCells<- levels(interaction(allVars, cell_lines, sep ="_"))
myData_ss$allVarCells <- paste(myData_ss$variable, myData_ss$cell_line, sep ="_")

  featuresDir <- paste(summaryStatsDir, "time", sep ="/")
  if (!file.exists(featuresDir)){
    dir.create(featuresDir)
  }
  if (!file.exists(paste(featuresDir, "RDataFiles", sep ="/"))){
    dir.create(paste(featuresDir, "RDataFiles", sep ="/"))
  }


myTimeFun <- function(myData_ss_c){
      currVar <- unique(myData_ss_c$variable)
      currVar<-currVar[currVar!="Displacement"]
      currVar<-as.character(currVar)
      myData_ss_c$variable <- factor(myData_ss_c$variable, levels = c(currVar, "Displacement"), order = TRUE)
      
        if (!grepl("^Displacement", allVarCells[i] )){
          scale.intDist <- max(myData_ss_c$meanSummaryStat[myData_ss_c$variable =="Displacement"], na.rm=TRUE)/ 
          max(myData_ss_c$meanSummaryStat[myData_ss_c$variable !="Displacement"], na.rm=TRUE)
          myData_ss_c$meanSummaryStat[myData_ss_c$variable =="Displacement"] <- 
          myData_ss_c$meanSummaryStat[myData_ss_c$variable =="Displacement"]/ scale.intDist
          myData_ss_c$sd[myData_ss_c$variable =="Displacement" & !is.na( myData_ss_c$sd)] <- 
          myData_ss_c$sd[myData_ss_c$variable =="Displacement" & !is.na( myData_ss_c$sd)]/ scale.intDist
        }
  # error bar
  
    if (errorType == "sd") {
    limits <- aes(ymax = meanSummaryStat + 0.5*sd, ymin = meanSummaryStat - 0.5*sd)
      } else if ( errorType == "cl95")
      {
      limits <- aes(ymax = meanSummaryStat + error975, ymin = meanSummaryStat - error975)
      } else 
      {
        stop("errorType either \"sd\" or \"cl95\"")
      }
   
  ###manual
  
#   myData_ss_c<- read.table( file ="H:/DILI screen/2014_02_11/output/2014_02_11 myData_summarized.txt", header = T, sep ="\t")
#   myData_ss_c$X <- NULL
#   head(myData_ss_c)
#   myData_ss_c <- myData_ss_c[ myData_ss_c$variable == "obj_nc_Intensity_MeanIntensity_img_gfp", ]
#   
#   myData_ss_c <- myData_ss_c[ !myData_ss_c$treatment %in%  c("Menadione", "etoposide " ,"Doxorubicin "), ]
  
  ## end manual
    
 p<- ggplot( data = myData_ss_c,  aes( x = timeAfterExposure , y = meanSummaryStat, colour = variable)) + 
   geom_point( size = 2, aes(shape = variable ), na.rm = TRUE ) +
    geom_smooth( aes(group = variable, color = variable), 
                 se = FALSE, size = 1, method = "loess", na.rm=TRUE, n = max.t) + 
                geom_errorbar(limits,  width = 0.2, span = 0.9)

    p <- p + facet_wrap( treatment ~ dose_uM  ) 
    p <- p + theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/max.t,0), colour = "grey50") ) + 
        theme( strip.text.x = element_text( size = 4 + round( 150/subplot.char.l, 0))) +
        ggtitle( paste(allVarCells[i], "summary_", gsub( ".h5",""  , hdf5FileName ), summName ) ) + 
        theme(plot.title = element_text(lineheight=.8, size = 10 ))   + theme(legend.position = "bottom") +
        theme_sharp()
 
  if(writePDFs) {
    pdf( file = paste(  featuresDir, "/", allVarCells[i], "_summarytest_", gsub( ".h5",""  , hdf5FileName ), ".pdf", sep ="" ), 
     height = p.size, width = p.size +round(0.2*max.t,0) )
     print( p )
     dev.off()
    }
  save(p, file = paste(  featuresDir, "/RDataFiles/", allVarCells[i], "_summary_", gsub( ".h5",""  , hdf5FileName ), ".RData", sep ="" ))
} # end myTimeFun

  cores <- min(c(length(allVarCells), numberCores))
  registerDoParallel(cores=cores)
  
  foreach( i = seq_along(allVarCells), .packages = c("ggplot2", "grid"), .export = "theme_sharp") %dopar% {
  myTimeFun(myData_ss[ myData_ss$allVarCells==allVarCells[i] | 
                         (myData_ss$variable == "Displacement" & 
                            myData_ss$cell_line == unique(myData_ss$cell_line[myData_ss$allVarCells==allVarCells[i]])  ), ])
}


#quantile plots - all quantiles per single feature
if(makeQuantilePlots){
  
    qVars <- unique(myData_ssq$variable)
    qVarsCells <- levels(interaction(qVars,cell_lines, sep ="_"))
    myData_ssq$qVarsCells <- paste(myData_ssq$variable, myData_ssq$cell_line, sep = "_")
    myData_ssq$quantile <- factor(myData_ssq$quantile)
    quantilesDir <- paste(summaryStatsDir, "quantiles", sep ="/")
    if (!file.exists(quantilesDir)){
      dir.create(quantilesDir)
      }
    if(!file.exists(paste(quantilesDir, "RdataFiles", sep ="/"))){
      dir.create(paste(quantilesDir, "RdataFiles", sep ="/"))
    }
  
quantilePlotFun <- function(myData_ss_c) {
   p<- ggplot( data = myData_ss_c,  aes( x = timeAfterExposure , y = meanQ, colour = quantile)) + 
   geom_point( size = 2, aes(shape = quantile ), na.rm= TRUE ) +
   geom_smooth( aes(group = quantile, color = quantile), se = FALSE, 
                size = 1, method = "loess", span = 0.9, na.rm = TRUE) 
  p <- p + facet_wrap( treatment ~ dose_uM ) 
  p <- p +  theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/max.t,0), colour = "grey50") ) + 
    theme( strip.text.x = element_text( size = 4 + round( 150/subplot.char.l, 0))) +
    ggtitle( paste( qVarsCells[i], "quantiles", gsub( ".h5",""  , hdf5FileName ), summName ) ) + 
    theme(plot.title = element_text(lineheight=.8, size = 10 )) + theme_sharp()  
  if(writePDFs){
    pdf( file = paste(  quantilesDir, "/", qVarsCells[i], "quantiles", gsub( ".h5",""  , hdf5FileName ), ".pdf", sep ="" ), 
        height = p.size, width = p.size +round(0.15*max.t,0) )
    print( p )
    dev.off()
  }
  save(p, file = paste(  quantilesDir, "/RdataFiles/", qVarsCells[i], "quantiles", gsub( ".h5",""  , hdf5FileName ),  ".RData", sep ="" ))
} # quantilePlot fun
colnames(myData_ssq)[ colnames(myData_ssq) == "value" ] <- "meanQ"
myData_ssq$dose_uM <- round(myData_ssq$dose_uM, 3)
cores <- min(c(length(qVarsCells), numberCores))
  registerDoParallel(cores=cores)
  
foreach( i = 1 : length(qVarsCells), .packages = c("ggplot2", "grid"), .export = 'theme_sharp' )  %dopar% {
  suppressWarnings(quantilePlotFun(myData_ssq[ myData_ssq$qVarsCells==qVarsCells[i] , ]))
  }
} # end if statement quantileplots
# now time curves for scaled data co-plotted:

chosenVars <- unique(myData_ss_scaled$variable)
chosenVars<-as.character(chosenVars)

#myData_ss_scaled_orig<- myData_ss_scaled

myData_ss_scaled<-myData_ss_scaled[ myData_ss_scaled$variable %in% chosenVars,]


for (i in 1 : length(cell_lines)) {
 
  myData_ss_c_scaled <- myData_ss_scaled[ myData_ss_scaled$cell_line == cell_lines[ i ], ]
  
  p<- ggplot( data = myData_ss_c_scaled,  aes( x = timeAfterExposure , y = meanSummaryStat,  color = variable, shape = variable)) +
    geom_point( size = 3, na.rm = TRUE ) + 
    scale_shape_manual(values= 1:length(unique(myData_ss_c_scaled$variable)) )  + 
    geom_smooth(aes(group = variable), se = FALSE, size = 1, method = "loess", span = 0.9, na.rm=TRUE)

  p <- p + facet_wrap( treatment ~ dose_uM  ) 
p <- p + theme_sharp() + theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 8 + round(150/max.t,0), colour = "grey50") ) + 
theme( strip.text.x = element_text( size = 10 + round( 150/subplot.char.l, 0))) +
  ggtitle( paste(cell_lines[i], "summary_scaled", gsub( ".h5",""  , hdf5FileName ), summName ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 24 ))  

if(writePDFs){
pdf( file = paste(  summaryStatsDir, "/", "scaled_coPlotted", "_summary_", gsub( ".h5",""  , hdf5FileName ), "_",cell_lines[i],".pdf", sep ="" ), 
     height = 1.8*p.size, width = 1.8*(p.size +round(0.08*max.t,0)) )

print( p )
dev.off()
}
save(p, file =  paste(  summaryStatsDir, "/", "scaled_coPlotted", "_summary_", gsub( ".h5",""  , hdf5FileName ), "_",cell_lines[i],".RData", sep ="" ))
  }

#dose response curves

barData$dose_uM <- factor( barData$dose_uM)

for (i in 1 : length(allVars)) {
barDataF <- barData[barData$variable == allVars[i], ]
  dose.n<-length(unique(barDataF$dose_uM))

if (errorType == "sd") {
limits <- aes(ymax = meanAUC + 0.5*sd, ymin = meanAUC - 0.5*sd)
  } else if ( errorType == "cl95")
    {
    limits <- aes(ymax = meanAUC + error975, ymin = meanAUC - error975)
    } else 
      {
        stop("errorType either \"sd\" or \"cl95\"")
      }

p<- ggplot( data = barDataF,  aes( x = as.factor(dose_uM) , y = meanAUC,  shape = cell_line, color = cell_line )) + geom_point( size = 4, na.rm = TRUE ) +
#   geom_smooth(aes(group = variable), se = FALSE, size = 1) + 
  geom_errorbar(limits, width= 0.2, dodge = T )


p <- p + facet_wrap( ~ treatment, scales = "free_x"  ) 
p <- p + theme_sharp() + theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 6 , colour = "grey50") ) + 
theme( strip.text.x = element_text( size = 6  )) +
  ggtitle( paste( allVars[i], "AUC over concentration", gsub( ".h5",""  , hdf5FileName ), summName  ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 6 )) 

if(writePDFs){
pdf( file = paste( summaryStatsDir, "/", "dose", "/", allVars[i], "_AUC_dose", gsub( ".h5",""  , hdf5FileName ), ".pdf", sep ="" ), height = p.size, width = p.size + dose.n )
print( p )
dev.off()
}
save(p, file = paste( summaryStatsDir, "/", "dose", "/", allVars[i], "_AUC_dose", gsub( ".h5",""  , hdf5FileName ), ".RData", sep ="" ))

}


# now make bar data for scaled data
barData_scaled <-ddply(myData_ss_scaled[ !is.na(myData_ss_scaled$meanSummaryStat), ], .(treatment, dose_uM, cell_line, control, variable), summarize,
  AUC = trapz(as.numeric(timeAfterExposure), as.numeric(meanSummaryStat )) )
barData_scaled <- barData_scaled[ barData_scaled$variable != "in",]
barData_scaled$dose_uM <-factor(barData_scaled$dose_uM)

#order the treatments based on average of all dose/ cel lines
mean.c.l <- ddply(barData_scaled, .(treatment), summarize, mean.t = mean(AUC, na.rm=TRUE))
barData_scaled$treatment <- factor((barData_scaled$treatment), levels =  unique(mean.c.l$treatment)[order(mean.c.l$mean.t)], order = T  )

p<- ggplot( data = barData_scaled,  aes( x = dose_uM , y = AUC,  line = cell_line, color = variable, shape = variable )) +
  geom_point( aes( x=dose_uM, y = AUC, group =  cell_line )) + scale_shape_manual(values= 1:(length(allVars)  ))

p <- p + facet_wrap( ~ treatment, scales = "free_x"  ) 
p <- p + theme_sharp() + theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 8 + round(100/dose.n,0), colour = "grey50") ) + 
theme( strip.text.x = element_text( size = 8 + round(400/ nrow(barData), 0) )) +
  ggtitle( paste( myFeature, "AUC over concentration", gsub( ".h5",""  , hdf5FileName ), summName ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 10 )) 

if(writePDFs){
pdf( file = paste( summaryStatsDir, "/", "scaled", "_AUC_dose", gsub( ".h5",""  , hdf5FileName ), ".pdf", sep ="" ), height = p.size, width = p.size + 5 )
print( p )
dev.off()
}
save(p, file = paste( summaryStatsDir, "/", "scaled", "_AUC_dose", gsub( ".h5",""  , hdf5FileName ), ".RData", sep ="" ))

head(myDFo)

```


# BLOCK 4: Tracking block.  For analyzing migration related data, run block1 and then block 4
This block is meant to reorganize data suitable for tracking -  CP does still not relabel tracked objects after splits or merges.
Also some options of reconnecting tracks included


```{r}
rm(list=ls()[ls() != "cp.pipeline.location"])
require(stringr)
source(file.path(cp.pipeline.location, "fixTrackingFun.R"))


dir()
load('outputList.Rdata')


if(length(unlist(lapply( lapply(outputList, names),str_match_all, "myDT") )) > 1) {
  outputListmyDT<- lapply(outputList, "[[", "myDT")
      testColN<- lapply(outputListmyDT, function(x) {(  (names(x)))} )
      all.identical <- function(x) all(mapply(identical, x[1], x[-1]))
      if(!all.identical(testColN))
        {
        stop("outputlist does have tables with identical column names/ 
             object names, manually rbind the outputlist")
        }

myDFo <- rbindlist(outputListmyDT)


 kMyVars <- outputList[length(outputList)][[1]]
  kMyVars$myDT <- NULL
} else {
  outputListmyDT <- outputList$myDT
  myDFo <- outputListmyDT
  kMyVars <- outputList[-1]
}

 
  kColNames <- kMyVars$kColNames
  dataFileName <- gsub(".txt", "",kMyVars$plateMDFileName)

myFeatures <- gsub("/", "_", 
                     gsub("^(Measurements/[0-9]{4}(-[0-9]{2}){5}/)", "", kMyVars$myFeaturePathsA)
                     )
myFeatures <- c(myFeatures, "imageCountTracked")
numberCores <- kMyVars$numberCores
writeSingleCellDataPerWell <- TRUE # write all single cell data in seperate file per well, takes time
writeAllSingleCellData<- TRUE  # write all single cell data in single file, takes time, usefull because plotting/ summary chunk can then load this for later (re) runs


# ===================== User defined variables =====================
# ===================== User defined variables =====================
# ===================== User defined variables =====================


reconnect_tracks <- TRUE # moet op true (FALSE not implemented yet)
max_pixel_reconnect1 <- 20 # if larger than CP settings calculation overhead can become alot higher
max_pixel_reconnect2 <- 20 # further in time cells might be further away from parent
max_pixel_reconnect3 <- 20
reconnect_frames <- 2  # over how many frames to connect? can choose 1,2 or 3. 1 means no frame is skipped, 2 then 1 frame is skipped etc. will first perform direct linking then skip 1 frame then 2 to try and reconnect tracks based on maximal considered distance
skip.wells <- c( )
minTrackedFrames <- 20 # remove short tracks from data output
parent_resolve_strategy <- "min_distance" # Can be "min_distance" to take closest to parent, or "disconnect_all" (recommended) to fix the duplicate parent without connecting any cell to the parent.

summaryStatFunction <- function(x) { mean(x, na.rm = TRUE) } # function(x) { mean(x, na.rm = TRUE) }  or function(x) { quantile(x, 0.8, na.rm = TRUE) } (you can choose which quantile - here it is set to 0.8)
errorType <- "sd"   #"sd"  or  "cl95"   the cl95 is two sided 95% confidence interval. sd is standard error, half above and half under the average
writeUniqueParentsNoRec <- TRUE
writeBeforeCombineTracks<- TRUE
writeAfterFirstConnect<- TRUE
writeAfterSecondReconnect<- TRUE
writeAfterThirdReconnect<- TRUE
## ========================== end user defined variables============
## ========================== end user defined variables============
## ========================== end user defined variables============
## ========================== end user defined variables============


if(exists('allTrackDF')){
rm("allTrackDF") # needs to be removed to be able to re-run this block
}


# get all the features needed for plotting:
myFeature <- myFeatures[1]


# if this is enabled the myDFo has to be modified so that no NA values exist for the tracked object. Assumed is that if an NA exists, the row  corresponding to a certain object will be NA

Parent_NON <- outputList$kColNames$parentObjectNumberCN
myDT <- myDFo
if (sum(is.na(myDT[,Parent_NON, with = FALSE])) > 0 )
  {
  stop("NA values in dataset. Consider using a different CP pipeline")
  }
# if needed I might have to remove certain the rows with certain column specific NA values (the measurements for example, or maybe the x-y coordinates?)

colnames(myDT)
uniqueWells <- unique(myDT[,locationID])
uniqueWells <- uniqueWells[ !uniqueWells %in% skip.wells]
uniqueWells<- factor(uniqueWells)

# check cell count
#cl<-makeCluster(1)
 # registerDoSNOW(cl)

#split data in numberCore parts, if length(uniqueWells) > numberCores
if( length(uniqueWells) < numberCores) {
  stop("Reduce the number of cores")
}
 
jumpInd <-length(uniqueWells) %/% numberCores
uniqueWellsLevels <- rep(1:numberCores, each = jumpInd)
#add some extra at the end in case levels is shorter:
extraEnd <-  length(uniqueWells) - length(uniqueWellsLevels) 
uniqueWellsLevels<- c(uniqueWellsLevels,  rep(uniqueWellsLevels[length(uniqueWellsLevels)], extraEnd))
if(length(uniqueWellsLevels) != length(uniqueWells) | !all(sort(uniqueWellsLevels) == uniqueWellsLevels)) {
  stop("making uniqueWellLevels failed")
}
uniqueWellGroups  = list()
  for(countergroups in seq_along(unique(uniqueWellsLevels))) {
    uniqueWellGroups[[countergroups]] <- uniqueWells[ uniqueWellsLevels == countergroups]
  }

  registerDoParallel(cores=numberCores)

cellNlist <- foreach ( cellC = seq_along(1:numberCores ), .packages = 'data.table') %dopar% {

    ind <-   myDT[ , locationID] %in% uniqueWellGroups[[cellC]]
    partmyDT <- myDT[ind,]
    out.min <- partmyDT[, min(imageCountTracked), by = locationID]
    out.max <- partmyDT[, max(imageCountTracked), by = locationID]
    setnames(out.min, 'V1', "minimum number of tracked objects") 
    setnames(out.max, 'V1', "maximum number of tracked objects") 
    setkey(out.min,locationID)
    setkey(out.max,locationID)
    out.both <- out.min[out.max]
    out.both
  }
cellNlist <- rbind.fill(cellNlist)
write.table(cellNlist, file = 'trackedObject_counts.txt' ,  row.names = FALSE, sep = "\t")
#rm("out.min", "out.max","out.both", "cellNlist", "partmyDFo")
#selFeatures <- gsub("\\/", "_",  str_match(  unlist(myFeaturePathsA), "([^/]*[\\/][.]*[^/]*)$")[, 1 ] ) 
  
 # selFeatures <- selFeatures[!is.na(selFeatures)]
  
setkey(myDT, locationID)

#TODO: comment this away/ remove this once not needed anymore
# will need to fix some colnames from wrong mainFunction.R output before 9 dec 2014
indC<-grep(  "_TrackObjects_DistanceTraveled_",colnames(myDT))

track_dist <- str_match(colnames(myDT)[indC],"(DistanceTraveled_[0-9]{1,3})$" )[2]
track_dist<- gsub("DistanceTraveled_", "", track_dist)
indEnd_ <- unlist(lapply(kColNames, function(x) {grepl("(_)$", x)  }))
names(indEnd_) <- NULL
kColNames[indEnd_] <- paste(kColNames[indEnd_], track_dist, sep="")

  registerDoParallel(cores=numberCores)


allTrackDF<-foreach(i = seq_along(uniqueWellGroups), 
                    .packages = c("reshape2", "plyr","stringr", "data.table" )) %dopar% {
  myDFstukkie <- myDT[ uniqueWellGroups[ i ]]
  fixTrackingFun(myDFstukkie, myFeatures, i, kColNames, uniqueWellGroups,
                 writeUniqueParentsNoRec, writeBeforeCombineTracks, reconnect_tracks,
                 max_pixel_reconnect1, max_pixel_reconnect2, max_pixel_reconnect3,
                 writeAfterFirstConnect, writeAfterSecondReconnect, writeAfterThirdReconnect,
                 reconnect_frames, minTrackedFrames, writeSingleCellDataPerWell, parent_resolve_strategy)
}


# pull out distinct data sets

allTrackDFreal <- lapply(allTrackDF, '[[', "allTrackDF")
directionality.data <- lapply(allTrackDF, '[[', "directionality.data")
directionality.data <- rbind.fill(directionality.data)
allTrackDF <- rbind.fill(allTrackDFreal)
rm(allTrackDFreal)

directionality.data <- as.data.table(directionality.data)
allTrackDF <- as.data.frame(allTrackDF)

#dit van RData file halen

metaData <- kMyVars$metaCSVData
indRM <- which(unlist(lapply(metaData, function(x) any(is.na(x)))))
print(c("removing columns from metadata file:" ,indRM))
if(!length(indRM) < 1) {
metaData[, eval(indRM):= NULL]  
}


#allTrackDF$treatment <- NA
#allTrackDF$dose_uM <- NA
#allTrackDF$control <- NA
#allTrackDF$cell_line <- NA
print("adding metadata:")

head(allTrackDF$location)
allTrackDF <- as.data.table(allTrackDF)
allTrackDF[, mergeLocation:= gsub("(_[1-9]{1})$", "", location)]

allTrackDF[ , mergeLocation:= as.factor(mergeLocation)]
metaData[, locationID:= as.factor(locationID)]
setkey(allTrackDF, "mergeLocation")
setkey(metaData, "locationID")

allTrackDF<- metaData[allTrackDF]


#allTrackDF <- allTrackDF[ !is.na(allTrackDF$value), ] # dit gaat niet ivm verschil in NA voor bijv eerste tijdpunt displacement
# misschien adv bepaalde variabele de na indexen maken en dan per feature deze index gebruiken
# niet ideaal om hier te doen (memory/ performance), maar voorlopig:
zehFeats <- unique(allTrackDF$.id)

# first feature for index

print("use first feature that is not displacement to create index to remove NA values from all features")
ind <- !is.na(allTrackDF[ allTrackDF$.id == myFeatures[!myFeatures %in% c("displacement" ) ][1], value ])
head(allTrackDF)

bufferList = list()
for( i in seq_along(zehFeats)){
  
bufferList[[i]] <- allTrackDF[  allTrackDF$.id == zehFeats[i], ][ind,]
}
bufferDF <- rbind.fill(bufferList)

write.table(bufferDF, file = "reorderedTrackData.txt", sep ="\t", row.names = FALSE)
write.table(directionality.data, file = "directionality.txt", sep ="\t", row.names= FALSE)


#
#TODO myDFo adv dit blok aanpassen voor GUI plotten

myDFo <- bufferDF
```

TODO: rainbow 0.5 tot x van YFP/CFP

First run block 4, if new session and block 4 was run before, load the reorderedTrackData.txt but run first part of block 4
# BLOCK 5: plotting track ordered data # TODO dit werkend maken voor myDFo....
#TODO single cell plots with displacement & chosen variable
#calculate single cell directionality (traveled distance / linear distance)
#zandloper plot michiel
```{r }
options(StringsAsFactors = FALSE)
#setwd("H:/michiel CP test data/RCM may 2014")
source(file.path(cp.pipeline.location,"theme_sharp.R"))
require(plyr)
require(grid)
require(GGally)
require(beanplot)
require(ggplot2)
library(doParallel)

getwd()

##== user defined variables
#====================================================================================#
#====================================================================================#
#====================================================================================#

timeBetweenFrames <-"00:30" #hh:mm
exposureDelay <- "00:10" #hh:mm
imagePixel.x <- 512
imagePixel.y <- 512
errorType <- "sd" # cl95 or sd
numberCores <- 6
directionalityPlot <- 4 # do you want the directionality plotted?, then choose how many intervals. Else set to FALSE
plot.by <- "treatment" # for now still only "treatment" is implemented enter "location"  or "treatment"

dir()
#allTrackDFt <- read.table(file ="reorderedTrackData.txt" , sep ="\t", header =T)
#allTrackDF<-bufferDF
head(allTrackDF)
unique(allTrackDF$.id)

# choose features of interest for plotting (use unique(allTrackDF$.id in line above to check which ones are available))
singleCellMeas <- c("obj_nuclei_Intensity_MeanIntensity_image_gfp",
                    "obj_nuclei_Intensity_MeanIntensity_image_hoechst",
                    "obj_nuclei_AreaShape_Area",
                    "obj_pID_pi_AreaShape_Area",
                    "obj_piID_annexin_AreaShape_Area",
                    "imageCountTracked",
                    "displacement") 
unique(allTrackDF$location)
plotTracks<-c("B02_2","B04_1", "D03_2", "D04_1" ,"E03_2","F02_1", "G02_1") # choose wells to plot tracks of, or put to FALSE
nCellTrack <- 20# how many cells do you want plotted in track plots To plot all cells choose ridiculous high number

DisplvsFirsFeature_linePlot <- TRUE
singleCell.plot.number <- 12 # for line plot

MV_pairPlots <- TRUE # takes long to make if many many cells in dataset

beanPlot <- TRUE

densityPlots <- TRUE

writePDFs <- TRUE
# ========= end of user defined variables
#====================================================================================#
#====================================================================================#
#====================================================================================#
# can load your reorderedTrackData.txt file here if needed


colnames(allTrackDF)[colnames(allTrackDF)==".id"] <- "feature" 
colnames(allTrackDF)[colnames(allTrackDF)=="variable"] <- "timePoint" 

timeBetweenFrames <- round(as.integer(strftime(strptime(timeBetweenFrames, format = "%H:%M"), "%H")) + 
                               1/60 * as.integer(strftime(strptime(timeBetweenFrames, 
                                                                   format = "%H:%M"), "%M")), digit =1 )

    exposureDelay <- round(as.integer(strftime(strptime(exposureDelay, format = "%H:%M"), "%H")) + 
                               1/60 * as.integer(strftime(strptime(exposureDelay, 
                                                                   format = "%H:%M"), "%M")), digit =1 )


allTrackDF$Metadata_tp <- gsub("TP_", "", allTrackDF$timePoint)

allTrackDF$timeAfterExposure <- as.integer(allTrackDF$Metadata_tp) * timeBetweenFrames + exposureDelay - timeBetweenFrames
allTrackDF$Metadata_tp<- NULL


size.data <- object.size(allTrackDF)
if(  size.data > 1518342696  ) {
numberCores <- 4
  print( paste("number of cores reduced to 4 because data size is : ", round(size.data/1024), "Mb"))
}
  

all.locations <- unique(allTrackDF$location)
all.treatments <- unique(allTrackDF$treatment)
singleCellMeas<-c(singleCellMeas, "imageCountTracked", "displacement")
singleCellMeas <- unique(singleCellMeas)

test1 <- allTrackDF[ allTrackDF$feature == singleCellMeas[1], c("location", "trackLabel") ]
test2 <- allTrackDF[ allTrackDF$feature == singleCellMeas[2], c("location", "trackLabel") ]

if(!all(test1== test2))
  {
  stop("correlation cannot be calculated, script error")
  }
rm(test1,test2)
corrMatrix = list()
print("Calculating single cell correlations for features: ")
cat(paste(singleCellMeas, collapse = "\n"))
Sys.sleep(0.1)

registerDoParallel(cores=numberCores)

jumpInd <-length(all.locations) %/% numberCores
all.location.Levels <- rep(1:numberCores, each = jumpInd)
#add some extra at the end in case levels is shorter:
extraEnd <- length(all.locations) - length(all.location.Levels)
all.location.Levels<- c(all.location.Levels, rep(all.location.Levels[length(all.location.Levels)], extraEnd))
if(length(all.location.Levels) != length(all.locations) | !all(sort(all.location.Levels) == all.location.Levels)) {
  stop("making all.location.Levels failed")
}
uniqueLocationsGroups  = list()
  for(countergroups in seq_along(unique(all.location.Levels))) {
    uniqueLocationsGroups[[countergroups]] <- all.locations[ all.location.Levels == countergroups]
  }


corrFun <- function(pieceAlltrackDF) {
  current.all.locations <- unique(pieceAlltrackDF$location)
  corrMatrix.list = list()
  for( innercount in seq_along(current.all.locations)) {
    
    buffer <- pieceAlltrackDF[ pieceAlltrackDF$location %in%  current.all.locations[innercount], ]
    buffer$feature <- factor(buffer$feature)
    buffer2 <-split(buffer, buffer$feature)
    buffer3 <- data.frame(buffer2[[1]][2])
    colnames(buffer3)<- names(buffer2)[1]
        for (k in 2:length(buffer2)){
          buffer3 <- cbind(buffer3, buffer2[[k]][2])
          colnames(buffer3)[k] <- names(buffer2)[k]
       }
  
  corrMatrix.list[[innercount]] <-as.data.frame(cor(buffer3, use = "complete"))
  corrMatrix.list[[innercount]]$location <- as.character(current.all.locations[ innercount ])
  }
corrMatrix <- rbind.fill(corrMatrix.list)
  return(corrMatrix)
} # end corrFun


# could lead to memory problems, might have to build in an option without parrallel processing
print("calculation single cell-based feature correlations (memory intensive operation")
corrMatrix <- foreach(i = seq_along(uniqueLocationsGroups), .packages = "plyr") %dopar% {
  pieceAlltrackDF <- allTrackDF[ allTrackDF$location %in% uniqueLocationsGroups[[i]] & 
                                  allTrackDF$feature %in% singleCellMeas, c('feature','value', 'location')]  
  corrFun( pieceAlltrackDF )
  }

corrMatrixDF<- rbind.fill(corrMatrix)

write.table(corrMatrixDF, file ="correlationTable.txt", col.names = NA, sep ="\t")

rm(corrMatrixDF)

if(!file.exists("trackOrderedPlots")){
  dir.create("trackOrderedPlots")
}
if(!file.exists("trackOrderedPlots/trackPlots")){
  dir.create("trackOrderedPlots/trackPlots")
}
if(!file.exists("trackOrderedPlots/trackPlots/RDataFiles")){
  dir.create("trackOrderedPlots/trackPlots/RDataFiles")
}

#plot tracks
myHueBreaks <- as.numeric(quantile(allTrackDF$timeAfterExposure, c(seq(0.1:1, by = 0.2))))

if(plotTracks[1] != FALSE){
  
  if( sum( plotTracks %in% unique(allTrackDF$location)) != length(plotTracks)){
    stop(paste("could not find location defined in plotTracks: ", plotTracks[ !plotTracks %in% unique(allTrackDF$location)]))
  }
  
  for (i in seq_along(plotTracks)){

  
  selallTrackDF <- allTrackDF[ allTrackDF$location == plotTracks[i],]
forGraph <- data.frame(xCoord = selallTrackDF$value[ selallTrackDF$feature ==kColNames$trackingxCoordCN], 
                       yCoord = selallTrackDF$value[ selallTrackDF$feature ==kColNames$trackingyCoordCN],
                       timeAfterExposure = selallTrackDF$timeAfterExposure[selallTrackDF$feature ==kColNames$trackingxCoordCN],
                       trackLabel = selallTrackDF$trackLabel[selallTrackDF$feature ==kColNames$trackingxCoordCN]
                      
                       )

forGraph$trackLabel <- factor(forGraph$trackLabel)
nTracks <- length(unique(forGraph$trackLabel))
if(nTracks > nCellTrack){
  theTracks<-unique(forGraph$trackLabel)
  selTracks <- sample(theTracks, nCellTrack, replace = FALSE )
  selTracks<-factor(selTracks)
forGraph <- forGraph[ forGraph$trackLabel %in% selTracks, ]
nTracks <- length(selTracks)

}
if(nTracks > 50){
shape_tr <- c( c(1 : 25, 25:1) ,c(1 : 25, 25:1)[ 1: (nTracks - 50)])
} else {
  shape_tr <- c(1 : 25, 25:1) 
}
p <- ggplot(data = forGraph , aes( x = xCoord, y = yCoord  )  ) + ylim(imagePixel.y, 0) + xlim(0,imagePixel.x)   +
              geom_point( size = 3, aes(alpha = timeAfterExposure, shape =  trackLabel, colour = trackLabel ),na.rm= TRUE  ) +
  geom_path(  aes(alpha = timeAfterExposure, x=xCoord, y = yCoord, group = trackLabel, colour = trackLabel), na.rm=TRUE) +
     scale_shape_manual( values = shape_tr )  +
   scale_alpha(range = c(0.05, 1), breaks = myHueBreaks ) +
  ggtitle( paste("tracks", plotTracks[i], sep ="_")) 
  
p <- p + theme(legend.direction = "horizontal", legend.position = "bottom", legend.box = "vertical")
 
p <- p + guides(col = guide_legend(nrow = round(nTracks/10, 0))) + guides( shape = guide_legend(nrow= round(nTracks/10, 0)))

if(writePDFs){
pdf( file = paste( "trackOrderedPlots/trackPlots/", plotTracks[i]," tracks.pdf", sep ="" ), height = 8 + round(nTracks / 20, 0), width = 8)
print(p)
dev.off()
}
save(p, file = paste( "trackOrderedPlots/trackPlots/RDataFiles/", plotTracks[i]," tracks", ".RData", sep ="" ))
}

}


#split dataframe in ncores dataframes and place them in list
if(length(all.locations) < numberCores){
  warning("less locations than cores, errors very likely")
}
npieces <-  length(all.locations)%/%numberCores
locationLevels <- rep(1:numberCores, each = npieces)
addL <- length(all.locations) - length(locationLevels)
locationLevels <- c(locationLevels, rep(locationLevels[length(locationLevels)], addL))
if(length(locationLevels) != length(all.locations)) {
  stop("failed creating levels for multicore processing")
}

allTrackDF$locationLevels <-NA
for( counterGr in seq_along(uniqueLocationsGroups)) {
  allTrackDF$locationLevels[ allTrackDF$location %in% uniqueLocationsGroups[[counterGr]]] <- counterGr
  }
  allTrackDF$locationLevels<- factor(allTrackDF$locationLevels)
  allTrackList_ncores <- split(allTrackDF, allTrackDF$locationLevels)
  allTrackList_ncores<-lapply(allTrackList_ncores, function(x) { x[, 'locationLevels'] <- NULL; return(x)}  )                     
  allTrackDF$locationLevels<-NULL
registerDoParallel(cores=numberCores)
# use this list to create summaries
print("aggregating data, if memory problems occur reduce number of cores")
 myData_ss_loc <- foreach(i = 1: numberCores  ) %dopar% {
  aggregate(
value ~ feature +location + timePoint + treatment + dose_uM + 
control + cell_line + timeAfterExposure, data = allTrackList_ncores[[i]], mean, na.rm = TRUE) 
    
} # 11 seconds = the winner!
rm(allTrackList_ncores)
myData_ss_loc <- rbind.fill(myData_ss_loc)


#alternative ways for faster summary calculations (might be good idea if RAM becomes an issue)
# myFun <- function(input) {
# inputSplit <- split(input, input$location)
# mylist2 <- lapply(seq_along(inputSplit), function(x) aggregate(
# value ~ feature +location + timePoint + treatment + dose_uM + 
# control + cell_line + timeAfterExposure, data = inputSplit[[x]], mean, na.rm = TRUE))
# 
# myDataT <- do.call("rbind", mylist2)
# return(myDataT)
# }
# 
# system.time(testmyFun <- myFun(allTrackDF)) # 26.85 sec elapsed

# # slowest method
# system.time(myData_ss_loc <- ddply(allTrackDF, .(feature,location,timePoint,treatment,dose_uM,control,cell_line,timeAfterExposure), summarize, 
#                    meanValue = mean(value, na.rm=TRUE)) #57 sec elapsed 

# R-base method
# 
# system.time(testing<- aggregate(
#   value ~ feature +location + timePoint + treatment + dose_uM + 
#     control + cell_line + timeAfterExposure, data = allTrackDF, mean, na.rm = TRUE)) #41 sec elapsed

colnames(myData_ss_loc)[colnames(myData_ss_loc) == "value"] <- "meanValue"
write.table(myData_ss_loc, file = "reorderedTrackData_summarized_wells.txt", sep = "\t", col.names=NA)


myData_ss_agg <- aggregate(data = myData_ss_loc, meanValue ~ feature + timePoint + treatment + dose_uM + 
control + cell_line + timeAfterExposure, mean, na.rm = TRUE)  

  myData_ss_agg_sd <- aggregate(data = myData_ss_loc, meanValue ~ feature + timePoint + treatment + dose_uM + 
    control + cell_line + timeAfterExposure, sd, na.rm = TRUE)  
    colnames(myData_ss_agg_sd)[colnames(myData_ss_agg_sd) == "meanValue"] <- "sd"


myData_ss <- merge(myData_ss_agg, myData_ss_agg_sd, by = c("feature",  "timePoint", "treatment", "dose_uM", 
"control" , "cell_line", "timeAfterExposure") )
rm("myData_ss_agg", "myData_ss_agg_sd")
colnames(myData_ss)[colnames(myData_ss) == "meanValue"] <- "meanValueT"
write.table(myData_ss, file = "reorderedTrackData_summarized_treatment.txt", sep = "\t", col.names=NA)


# plot all time curves of all treatment dose combinations
p.size <-  6 + round(0.2 *length(unique(paste(myData_ss$treatment,myData_ss$dose_uM))),0)
subplot.char.l <- max(nchar(paste(unique(myData_ss$dose_uM), unique(myData_ss$treatment)) ))
n.tp <- length(unique(myData_ss$timePoint))
cell_lines <- unique(myData_ss$cell_line)
allVars <- as.character(unique(myData_ss$feature))
allVars<- allVars[ !allVars %in% c("xCoord", "yCoord")]


if (!file.exists("trackOrderedPlots/timePlots")){
  dir.create("trackOrderedPlots/timePlots")
}


#kleine test

# plotTimeFun <- function(input) {
#     p<-ggplot(data = input, aes(x=colA)) + geom_density() + facet_wrap( ~colC)
# 
#     pdf( file = paste( feats[i] ,"test.pdf"), height = 5, width = 5 )  
# print(p)
# dev.off()
# }

# testDF <- data.frame(colA = sample(c(LETTERS[1:24],letters[1:24]), 50000, replace = TRUE), colB = 5*rnorm(50000))
# testDF$colC <- sample(c("big", "middle", "small"),50000, replace = TRUE)
# 
# head(testDF)
# 
# plotTimeFun(testDF)
# feats <- unique(testDF$colA)
# 
# foreach(i = seq_along(feats), .packages = 'ggplot2') %dopar% {
#   plotTimeFun(testDF[ testDF$colA == feats[i],])
#   }
# 

cl<-makeCluster(numberCores, .outfile="")
registerDoParallel(cl)


if(!file.exists("trackOrderedPlots/timePlots/RDataFiles")){
  dir.create("trackOrderedPlots/timePlots/RDataFiles")
}

plotTimeFun <- function(myData_ss) {

   if(plot.by == "treatment") {
    myData_ss_f <- myData_ss[myData_ss$feature==allVars[j] | myData_ss$feature == "displacement", ]

} else if (plot.by == "location") {
  myData_ss_f <- myData_ss
  myData_ss_f$sd <- NA
  myData_ss_f$error95 <- NA
} else{
  stop("please define what to plot by: plot.by = \"treatment\" or \"location\"")
}
#first plot all variables seperately


if (allVars[j] != "displacement"){
   max.displ <- max(myData_ss_f$meanValueT[myData_ss_f$feature =="displacement"], na.rm=TRUE)
   max.feat <- max(myData_ss_f$meanValueT[myData_ss_f$feature !="displacement"], na.rm=TRUE)
   scale.intDist <-max.displ/max.feat 
   
    myData_ss_f$meanValueT[myData_ss_f$feature =="displacement"] <- 
    myData_ss_f$meanValueT[myData_ss_f$feature =="displacement"]/ scale.intDist
    myData_ss_f$sd[myData_ss_f$feature =="displacement" & !is.na( myData_ss_f$sd)] <- 
    myData_ss_f$sd[myData_ss_f$feature =="displacement" & !is.na( myData_ss_f$sd)]/ scale.intDist

   }
max.yLim <- max(myData_ss_f$meanValueT + 0.5*myData_ss_f$sd) + 0.01

  pdf( file = paste(  "trackOrderedPlots/timePlots/", allVars[j], "_summary_",  ".pdf", sep ="" ), 
     height = 6+p.size, width = 8 + 1.1* p.size )
  for (i in 1 : length(cell_lines)) {

  myData_ss_c <- myData_ss_f[ myData_ss_f$cell_line == cell_lines[ i ], ]

# error bar
  
  if (errorType == "sd") {
limits <- aes(ymax = meanValueT + 0.5*sd, ymin = meanValueT - 0.5*sd)
  } else if ( errorType == "cl95")
    {
    limits <- aes(ymax = meanValueT + error975, ymin = meanValueT - error975)
    } else 
      {
        stop("errorType either \"sd\" or \"cl95\"")
      }
 
 p<- ggplot( data = myData_ss_c,  aes( x = timeAfterExposure , y = meanValueT, colour = feature)) + 
   geom_point( size = 2, aes(shape = feature ), na.rm=TRUE ) +
    geom_smooth( aes(group = feature, color = feature), se = FALSE, size = 1, method = "loess", na.rm=TRUE)  + geom_errorbar(limits,  width = 0.2)

p <- p + facet_wrap( treatment ~ dose_uM  ) 

p <- p +  theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/n.tp,0), colour = "grey50") ) + 
theme( strip.text.x = element_text( size = 4 + round( 150/subplot.char.l, 0))) +
  ggtitle( paste(cell_lines[i], allVars[j], "summary_" ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 12 ))   +ylim(0, max.yLim ) +  theme(legend.position="bottom") + theme_sharp()

save(p, file = paste(  "trackOrderedPlots/timePlots/RDataFiles/", allVars[j], "_summary_", "_", cell_lines[i], ".RData", sep ="" ))  

if(writePDFs){

  print( p )
  }
  }
if(writePDFs){
dev.off()
    }
 } # end plotTimeFun 

print("Printing cell-mean time plots for selected features")

  foreach( j = seq_along(allVars), .packages = c('ggplot2', 'grid'), .export = 'theme_sharp', .verbose = TRUE) %dopar% {
  plotTimeFun(myData_ss[myData_ss$feature==allVars[j] | myData_ss$feature == "displacement", ])
}  

# pair plots

if (MV_pairPlots){

if(!file.exists("trackOrderedPlots/multivariatePlots")){
  dir.create("trackOrderedPlots/multivariatePlots")
}
if(!file.exists("trackOrderedPlots/multivariatePlots/scaled")){
  dir.create("trackOrderedPlots/multivariatePlots/scaled")
}
if(!file.exists("trackOrderedPlots/multivariatePlots/origValues")){
  dir.create("trackOrderedPlots/multivariatePlots/origValues")
}
if(!file.exists("trackOrderedPlots/multivariatePlots/origValues/RDataFiles")){
  dir.create("trackOrderedPlots/multivariatePlots/origValues/RDataFiles")
}
if(!file.exists("trackOrderedPlots/multivariatePlots/scaled/RDataFiles")){
  dir.create("trackOrderedPlots/multivariatePlots/scaled/RDataFiles")
}

# make treatment groups

jumpInd <-length(all.treatments) %/% numberCores
all.treatment.Levels <- rep(1:numberCores, each = jumpInd)
#add some extra at the end in case levels is shorter:
extraEnd <- length(all.treatments) - length(all.treatment.Levels)
all.treatment.Levels<- c(all.treatment.Levels, rep(all.treatment.Levels[length(all.treatment.Levels)], extraEnd))
if(length(all.treatment.Levels) != length(all.treatments) | !all(sort(all.treatment.Levels) == all.treatment.Levels)) {
  stop("making all.location.Levels failed")
}
uniqueTreatmentGroups  = list()
  for(countergroups in seq_along(unique(all.treatment.Levels))) {
    uniqueTreatmentGroups[[countergroups]] <- all.treatments[ all.treatment.Levels == countergroups]
  }

f.size <- 5*length(singleCellMeas)

mvpairplotFun <- function(inputData)  {
  currentTreats<- unique(inputData$treatment)
  
  for( innerPairCounter in seq_along(currentTreats)) {
    scPlotData_loc<-  inputData[ inputData$treatment == currentTreats[innerPairCounter], ]
  print(paste("printing innerPairCounter", innerPairCounter , "and i: ", i))
  for(j in seq_along(cell_lines)) {
  
    norm.data <- ddply(subset(scPlotData_loc, scPlotData_loc$cell_line == cell_lines[j])
                       , .(feature), transform, sValue = (value - mean(value))  / sd(value))

      p <- ggpairs(data=norm.data,
           #columns=c('feature', 'sValue'), 
           upper = list(continous = "box"), legends = T,
           lower = list(continous = "density"),
           #diag = list(continous = "points"),
           title=paste(cell_lines[j], currentTreats[innerPairCounter],"scaled single cell") ) 
        if(writePDFs){
          pdf( file = paste(  "trackOrderedPlots/multivariatePlots/scaled/", "scaled_multivar", "_", currentTreats[innerPairCounter], 
                        cell_lines[j],".pdf", sep ="" ), 
                        height =1*f.size, width = 1.5*f.size  )
          print( p )
          dev.off()
          }
  save(p, file = paste(  "trackOrderedPlots/multivariatePlots/scaled/RDataFiles/", "scaled_multivar", "_", currentTreats[innerPairCounter], cell_lines[j], ".RData", sep ="" ))
          scPlotData_loc_wide<-reshape(data=subset(scPlotData_loc, scPlotData_loc$cell_line == cell_lines[j]),
                                                   direction = "wide", 
                                                   idvar = c("trackLabel","location" , "timePoint", "treatment", 
                                                             "dose_uM", "control", "cell_line", "timeAfterExposure"),
                                                  timevar = "feature", v.names = "value")
  
  ind<- apply(scPlotData_loc_wide,MARGIN = 1, function(x) sum(is.na(x)))
            ind <- ind == 0
            scPlotData_loc_wide<- scPlotData_loc_wide[ind,]
        
    timeIntervals <- quantile(unique(scPlotData_loc_wide$timeAfterExposure), seq(0,1, length.out = 5))
    scPlotData_loc_wide$timeLevels <- NA
    for( timeCounter in 1: 4) {
        scPlotData_loc_wide$timeLevels[ scPlotData_loc_wide$timeAfterExposure <= timeIntervals[timeCounter+1] &
                                      scPlotData_loc_wide$timeAfterExposure >= timeIntervals[timeCounter]] <- timeCounter
    }
  scPlotData_loc_wide$timeLevels<- factor(scPlotData_loc_wide$timeLevels)

  #nasty little trick to make the density function work (sd is not allowed to be zero.)
  if(exists("scPlotData_loc_wide$value.imageCountTracked")){
  scPlotData_loc_wide$value.imageCountTracked <- scPlotData_loc_wide$value.imageCountTracked + 
  0.1*rnorm(length(scPlotData_loc_wide$value.imageCountTracked))
}
    pp<-ggpairs(data=scPlotData_loc_wide,
#              columns= colnames(scPlotData_loc_wide)[!colnames(scPlotData_loc_wide) %in% 
#                c("location", "trackLabel", "timePoint", "treatment", "dose_uM", "control", 
#                  "cell_line", "timeAfterExposure", "timeLevels", "value.imageCountTracked")], 
             upper = list(continuous = "density"), legends = TRUE,params=list(labelSize=2),
             #lower = list(combo = "facetdensity"),
             #diag = list(continous = "points"),
             title=paste(cell_lines[j], currentTreats[innerPairCounter],"orig Values single cells"),
             color = "timeLevels"
            ) 

  if(writePDFs){
     pdf( file = paste(  "trackOrderedPlots/multivariatePlots/origValues/", "multivar_OrigValues", "_", currentTreats[innerPairCounter],
                         cell_lines[j],".pdf", sep ="" ), 
                        height = 1.2*f.size+6  , width = 1.5*f.size +8 )
    print( pp )
    dev.off()
    }
  save(pp, file =  paste(  "trackOrderedPlots/multivariatePlots/origValues/RDataFiles/", "multivar_OrigValues", "_", 
                          currentTreats[innerPairCounter], cell_lines[j], "_pp.RData", sep ="" ))
  } # j-loop cell lines

} # innerTreatCounter loop
} # end mvpairplotFunction

  print("Printing multivariate pairplots:")
  scPlotData <- allTrackDF[ allTrackDF$feature %in% singleCellMeas,  ]
  scPlotData <- scPlotData[ !is.na(scPlotData$value), ]

    foreach( i = 1:numberCores, .packages = c('ggplot2', 'plyr', 'GGally'), .verbose = TRUE) %dopar% {
      mvpairplotFun( scPlotData[ scPlotData$treatment %in% uniqueTreatmentGroups[[ i ]],] )
    }
rm("scPlotData")


} #end MV_pairPlots
stopCluster(cl)


if(DisplvsFirsFeature_linePlot){

if(!file.exists("trackOrderedPlots/singleCell_linePlots")){
  dir.create("trackOrderedPlots/singleCell_linePlots")
}
if(!file.exists("trackOrderedPlots/singleCell_linePlots/RdataFiles")){
  dir.create("trackOrderedPlots/singleCell_linePlots/RdataFiles")
}


myLinePlotFun <- function(featSubWell_g) {
  
  featSubWell_g$featTrackLabel <- paste(featSubWell_g$feature, featSubWell_g$trackLabel, sep ="_")
  currentLocations <- unique(featSubWell_g$location)

   pdf( file = paste(  "trackOrderedPlots/singleCell_linePlots/","_",
                     "LocsCore_", i, "_",singleCellMeas[1],
                    "_disp-feature-singleCell", "_",  ".pdf", sep ="" ), 
     height = 10, width = p.size +round(0.2*n.tp,0) )
  
      for( innerLineCounter in seq_along(currentLocations)) {
        featSubWell <- featSubWell_g[ featSubWell_g$location == currentLocations[innerLineCounter], ]
    
        if (singleCell.plot.number < max(featSubWell$trackLabel)) {
          n.tracks <- max(featSubWell$trackLabel)
          chosen.tracks <- sample( 1 : n.tracks, singleCell.plot.number, replace = FALSE)
          featSubWell <- featSubWell[ featSubWell$trackLabel %in% chosen.tracks, ]
        } 
      lp <- ggplot( data = featSubWell, aes( x = timeAfterExposure, y = value, shape = feature) ) + 
        geom_point(aes(color = as.factor(trackLabel)),na.rm=TRUE)
      lp <- lp + geom_line(aes(group=featTrackLabel, colour = as.factor(trackLabel), linetype = as.factor(trackLabel)),
                           na.rm=TRUE)  +
            ylab(expression( frac(bold(Pixel),bold(Frame)) ~~italic("and")~~ (frac(mean(italic(bold(positionshift))), 
                                                                                   mean(italic(bold(myFeature)))))%*%italic(Feature1)  ))
      lp <- lp + ggtitle( paste( currentLocations[innerLineCounter], unique(featSubWell$treatment), unique(featSubWell$dose_uM),sep =" " ) )  
      lp <- lp + theme( axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5, size = (1 +  round(( 2000/n.tp) / 12, 0) ),
        colour = "grey50" ) ) + theme(legend.position="bottom") + theme_sharp()
# add control data
      save(lp, file =  paste(  "trackOrderedPlots/singleCell_linePlots/RdataFiles/",
                         singleCellMeas[1],"disp-feature-singleCell", "_", all.locations[i], "_lp.RData", sep ="" ))
  print(lp)  
  } # end innerLocationLoop
if(writePDFs){
  
  dev.off()
  }
 }  # linePlotFunction

featSub <- allTrackDF[ allTrackDF$feature == singleCellMeas[1] | allTrackDF$feature == "displacement" , ]
featSub$feature <- as.character(featSub$feature)
# scale speed:
sc.f <-  max(featSub$value[ featSub$feature==singleCellMeas[1]], na.rm = TRUE) /max(featSub$value[featSub$feature=="displacement"], na.rm = TRUE)
    featSub$value [ featSub$feature == "displacement"] <-featSub$value [ featSub$feature == "displacement" ] * sc.f
    featSub<- featSub[ !is.na(featSub$value), ]
    featSub$value[featSub$feature =="displacement"] <- smooth(featSub$value[featSub$feature == "displacement"])

  featSub$locationLevels <-NA
  for( counterGr in seq_along(uniqueLocationsGroups)) {
    featSub$locationLevels[ featSub$location %in% uniqueLocationsGroups[[counterGr]]] <- counterGr
  }
    featSub$locationLevels<- factor(featSub$locationLevels)
    featSub_ncores <- split(featSub, featSub$locationLevels)
    featSub_ncores<-lapply(featSub_ncores, function(x) { x[, 'locationLevels'] <- NULL; return(x)}  )                     
    featSub$locationLevels<-NULL

registerDoParallel(cores=numberCores)

      foreach(i = seq_along(uniqueLocationsGroups), .verbose = TRUE, .packages = c('ggplot2', 'grid'), .export = 'theme_sharp') %dopar% {
          myLinePlotFun(featSub_ncores[[i]])
      }

} #IF DisplvsFirsFeature_linePlot


if(MV_pairPlots){
# also plot a correlation of all single cell objects for density vs speed
myData_ss_loc_sub <- myData_ss_loc[ myData_ss_loc$feature == "imageCountTracked", ]
colnames(myData_ss_loc_sub)[colnames(myData_ss_loc_sub) %in% "meanValue"] <- "imageCountTracked"

myData_ss_loc_sub_disp <-  myData_ss_loc[myData_ss_loc$feature == "displacement",]
myData_ss_loc_sub_disp <- myData_ss_loc_sub_disp[ , !colnames(myData_ss_loc_sub_disp) %in% "feature"]
colnames(myData_ss_loc_sub_disp)[ncol(myData_ss_loc_sub_disp)] <- "displacement"
myData_ss_loc_sub<- merge(myData_ss_loc_sub, myData_ss_loc_sub_disp, by = c("location", "timePoint", "treatment", "dose_uM",
                                                                            "control","cell_line", "timeAfterExposure"))

t.l <- length(unique(myData_ss_loc_sub$timeAfterExposure))
timeRangen <- t.l %/% 4
timeRange <- rep(1:4, each = timeRangen )
timeRange<- c(timeRange, rep(timeRange[length(timeRange)], t.l - length(timeRange)))
timeDF <- data.frame(timeAfterExposure = sort(unique(myData_ss_loc_sub$timeAfterExposure)), timeRange = timeRange
                               )

myData_ss_loc_sub<- merge(myData_ss_loc_sub, timeDF, by = "timeAfterExposure")
myData_ss_loc_sub$timeRange<- as.numeric(myData_ss_loc_sub$timeRange)
myData_ss_loc_sub$timeRange[myData_ss_loc_sub$timeRange == 1]  <- "first_quarter_TPs"
myData_ss_loc_sub$timeRange[myData_ss_loc_sub$timeRange == 2]  <- "second_quarter_TPs"
myData_ss_loc_sub$timeRange[myData_ss_loc_sub$timeRange == 3]  <- "third_quarter_TPs"
myData_ss_loc_sub$timeRange[myData_ss_loc_sub$timeRange == 4]  <- "fourth_quarter_TPs"
myData_ss_loc_sub$timeRange<- factor(myData_ss_loc_sub$timeRange, levels = c(
  "first_quarter_TPs","second_quarter_TPs","third_quarter_TPs","fourth_quarter_TPs"), order = TRUE)

p <- ggplot( myData_ss_loc_sub, aes( x=imageCountTracked, y = displacement )) +
  geom_point(aes(color = control, shape = timeRange, alpha = 0.2), na.rm=TRUE, size = 2 ) + geom_smooth(method = "lm", na.rm = TRUE)

if(writePDFs){
  pdf( file = paste(  "trackOrderedPlots/multivariatePlots/", "cellCount_vs_displacement", ".pdf", sep ="" ), 
     height = 14, width = 16 )
  print(p)
dev.off()
}

save(p, file =  paste(  "trackOrderedPlots/multivariatePlots/", "cellCount_vs_displacement", ".RData", sep ="" ))

}


if(!file.exists("trackOrderedPlots/other")){
  dir.create("trackOrderedPlots/other")
}
# plot calc.directionality 
  # calculate for each track per domain the directionality as net displ by cummDistTraveled divided  using x & y coord and displacement
if( directionalityPlot != FALSE)
  {
  
  directionalityData <- read.table("directionality.txt", sep="\t", header = TRUE)
  directionalityData$titles <- paste(directionalityData$location, directionalityData$treatment, sep = " ")
  trackLengthRange <- quantile(directionalityData$trackLength, seq(0 ,1, length.out = directionalityPlot+1))
  directionalityData$trackLengthRange<- NA
  for (directRange in 1: directionalityPlot){
  directionalityData$trackLengthRange[ directionalityData$trackLength <= trackLengthRange[directRange+1] & 
                                         directionalityData$trackLength >= trackLengthRange[directRange]] <- directRange
  }
  directionalityData$trackLengthRange<-factor(directionalityData$trackLengthRange)
  
  p <- ggplot(directionalityData, aes(x=trackLength, y= directionality, color = trackLengthRange)) + geom_point(alpha = 0.5) + facet_wrap(~treatment) +
    ggtitle("Directionality: net. displ. divided by cummDistTraveled") + geom_smooth(aes(group = treatment), method = "loess" )
  
  
  if(writePDFs){
     
     pdf( file = paste(  "trackOrderedPlots/other/", "directionality", ".pdf", sep ="" ), 
     height = 4+ 2*p.size, width = 6+2*p.size )
     print(p)
     dev.off()
     }
  
  save(p, file =  paste(  "trackOrderedPlots/other/", "directionality", ".RData", sep ="" ))
  
  }


#density plot
if(!file.exists("trackOrderedPlots/densityPlots")){
  dir.create("trackOrderedPlots/densityPlots")
}
if(!file.exists("trackOrderedPlots/densityPlots/RdataFiles"))
  {
  dir.create("trackOrderedPlots/densityPlots/RdataFiles")
  }


if(densityPlots){
  singleCellMeas_noI <- singleCellMeas[!singleCellMeas == "imageCountTracked"]
      densityPlotFun <- function(oneFeatallTrackDF) {
          if(writePDFs) {
              pdf(file = paste("trackOrderedPlots/densityPlots/densityPlot", singleCellMeas_noI[kk],".pdf", sep =""),
                  height = 6+p.size,width = 10+1.2*p.size)
              }
           for (ii in seq_along(cell_lines)){
            oneFeatallTrackDF_c<- oneFeatallTrackDF[ oneFeatallTrackDF$cell_line == cell_lines[ii], ]
      
          p <- ggplot(oneFeatallTrackDF_c,  aes(value, colour = control)) + geom_density(na.rm=TRUE) + facet_wrap(~treatment) +
          theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/n.tp,0), colour = "grey50") ) + 
          theme( strip.text.x = element_text( size = 10)) +
          ggtitle( paste( "density ",singleCellMeas_noI[kk], "_", cell_lines[ii], sep ="" ) ) + 
          theme(plot.title = element_text(lineheight=.8, size = 12 ))   + theme_sharp()

save(p, file = paste("trackOrderedPlots/densityPlots/RdataFiles/densityPlot", singleCellMeas_noI[kk], cell_lines[ii],".RData", sep =""))

  if(writePDFs){
  suppressWarnings(print(p))  
  }
} # ii loop cell_lines
if(writePDFs){  
dev.off()
}
} #densityPlotFun

registerDoParallel(cores=numberCores)


foreach(kk = seq_along(singleCellMeas_noI), .packages = c('ggplot2', 'grid', 'plyr'), .export = 'theme_sharp', .verbose = TRUE) %dopar% {
  densityPlotFun(subset(allTrackDF,allTrackDF$feature ==singleCellMeas_noI[kk]))
  }

} # end if densityplot

if(beanPlot){
registerDoParallel(cores=numberCores)
myBeanFunction <- function(my.dat) {  
my.dats <- split(my.dat$value, my.dat$treatment )

pdf(file = paste("trackOrderedPlots/other/beanplot_", all.feat_cell[mm], ".pdf", sep =""), 
      height = 6+ 0.5*p.size, width = 8 + 2*p.size)
    suppressWarnings(print(beanplot(my.dats, main=all.feat_cell[mm], log="", bw = "nrd0", beanlines = "quantiles", las = 3 )))
  dev.off()
} #beanFUn

par(las=3)

  allTrackDF$feat_cell <- paste(allTrackDF$feature, allTrackDF$cell_line, sep ="_")
  all.feat_cell <- suppressWarnings(levels(interaction(singleCellMeas[singleCellMeas != "imageCountTracked"], cell_lines, sep = "_")))

foreach(mm = seq_along(all.feat_cell), .packages = "beanplot") %dopar% {
  myBeanFunction(
    allTrackDF[ allTrackDF$feat_cell== all.feat_cell[ mm ],]
                )
  }
 par(las=1)
} # if beanplot


# custom user function


```


# BLOCK 6 temporary oscillation solution: reformat BLOCK 4 output data to suitable solution for Dizi's Graph measure script

```{r}
setwd("D:/oscillation") # somewhere not in outputPath, to store connector file (connector file is name conversion of wells to numbered names needed for tool dizi)

outputPath<-"D:/oscillation/test2" # where do you want DILI-formatted files stored?
trackOrderedDataPath <- "D:/michiel CP test data/migration test set/trackOrderedData/xCoord"


##== end user input ====================


all.files <- dir(trackOrderedDataPath)
unformattedList = list()
for (i in seq_along(all.files)){
  unformattedList[[i]] <- read.table(paste(trackOrderedDataPath,"/", all.files[i], sep =""), sep = "\t", row.names = 1, header = T)
}

  unformattedList<-lapply(unformattedList, function(x)   x[!is.na(x[,2]),])
  unformattedList<- lapply(unformattedList, t)
  nframes <- nrow(unformattedList[[1]])
  dedata <- data.frame(Frame = 0:(nframes-1 ))
  unformattedList <- lapply(unformattedList, function(x) cbind(dedata , x))
  unformattedList<-lapply(unformattedList, function(x)   x[!rowSums(is.na(x)) > (ncol(x)-2),])
  unformattedList<- lapply(unformattedList, function(x)  { x[is.na(x)] <- -1; x })

for(i in seq_along(unformattedList)){
  write.table(unformattedList[[i]], file = paste(outputPath,"/testing_", i, ".xls", sep =""), sep = "\t", row.names = F)
}
connector <- cbind(seq_along(unformattedList), all.files)
write.table(connector, sep ="\t", file = "connector.txt",row.names = F)


```

BLOCK 7: combine replicates
Combines several replicate yyyy_mm_dd_[ cell line ]_summarized.txt files 
```{r }
require(plyr)
require(ggplot2)
require(doParallel)
require(grid)
options(stringsAsFactors = FALSE)


############ user defined variables ##########================
############ user defined variables ##########================
############ user defined variables ##########================
# you only need to set the data paths (does not matter how many):
rm(list=ls()[ls() != "cp.pipeline.location"])
setwd("H:/unilever/replicate plots")

yMax <- FALSE # to zoom in on y axis choose max y-value. Else set to FALSE for default

source("H:/R_HOME/R_WORK/R scipts/theme_sharp.R")

file.paths <- c("J:/BAC reporter dataset LU/20131108 TP53BP1/2013_11_08 myData_summarized.txt",
                "J:/BAC reporter dataset LU/20131122 TP53BP1/2013_11_22 myData_summarized.txt",
                "J:/BAC reporter dataset LU/20131125 TP53BP1/2013_11_25 myData_summarized.txt"
                )
numberCores <- 8

############ end user defined variables ##########================
############ end user defined variables ##########================
############ end user defined variables ##########================

input.list =list()
  for (i in seq_along(file.paths)){
    input.list[[i]] <- read.table(file = file.paths[i], sep ='\t', header = TRUE, row.names = 1)
    input.list[[i]]$replicate <- paste("Replicate", i)
    }
  my.data <- rbind.fill(input.list)

repliData<- data.frame(file.paths = file.paths, replicate = 1:i)
write.table(repliData, file = "replicate dates.txt", sep = "\t")


#calculate means and sd

# myData_ss <- aggregate( data = my.data,  meanSummaryStat ~ treatment + dose_uM + timeAfterExposure + control + cell_line +
# if you have the exact same timepoints run the aggregate functions bellow:
#                                                       variable + replicate, mean, na.rm= TRUE)
#     colnames(myData_ss)[ncol(myData_ss)] <- "replicate_mean"
# myData_ss_sd <- aggregate( data = my.data,  meanSummaryStat ~ treatment + dose_uM + timeAfterExposure + control + cell_line +
#                                                       variable + replicate, sd, na.rm= TRUE)
#     colnames(myData_ss_sd)[ncol(myData_ss_sd)] <- "replicate_SD"
# myData_ss <- merge(myData_ss, myData_ss_sd, by = c("treatment", "dose_uM", "timeAfterExposure",
#                                                    "control","cell_line", "variable", "replicate"))


myData_ss <- my.data

p.size <-  6 + round(0.1 *length(unique(paste(myData_ss$treatment,myData_ss$dose_uM))),0)
subplot.char.l <- max(nchar(paste(unique(myData_ss$dose_uM), unique(myData_ss$treatment)) ))
max.t <- max(as.numeric(myData_ss$timeAfterExposure))

myData_ss$cell_line<- factor(myData_ss$cell_line)

cell_lines <- unique(myData_ss$cell_line)
allVars <- as.character(unique(myData_ss$variable))

featuresDir <- "replicatePlot"
if (!file.exists(featuresDir)){
  dir.create(featuresDir)
}
if(!file.exists("replicatePlot/RDataFiles")){
  dir.create("replicatePlot/RDataFiles")
}

timePlotFun <- function( myData_ss_f, yMax ) {

#first plot all variables seperately
for (i in 1 : length(cell_lines)) {

  myData_ss_c <- myData_ss_f[ myData_ss_f$cell_line == cell_lines[ i ], ]
  
 
 p<- ggplot( data = myData_ss_c,  aes( x = timeAfterExposure , y = meanSummaryStat, colour= replicate )) + 
   geom_point( size = 1, aes(shape = variable, color = replicate ), na.rm = TRUE ) +
    geom_smooth( se = FALSE, size = 1, method = "loess", na.rm=TRUE, n = max.t) 
#+ geom_errorbar(limits,  width = 0.2, span = 0.9)

if(yMax == FALSE){
  yMax <- max(myData_ss_c$meanSummaryStat, na.rm=TRUE)
}

p <- p + facet_wrap( treatment ~ dose_uM  ) + coord_cartesian(ylim=c(0,yMax))
p <- p + theme_sharp() + theme( axis.text.x = element_text(angle = 90, hjust = 1, size = 4 + round(150/max.t,0), colour = "grey50") ) + 
theme( strip.text.x = element_text( size = 4 + round( 150/subplot.char.l, 0))) +
  ggtitle( paste(cell_lines[i], allVars[j], "summary_", "replicateplot" ) ) + 
  theme(plot.title = element_text(lineheight=.8, size = 10 ))   + theme_sharp()


pdf( file = paste(  featuresDir, "/", allVars[j], "_summary_","_", cell_lines[i], ".pdf", sep ="" ), 
     height = 1.5*p.size, width = 1.5*p.size +round(0.2*max.t,0) )

print( p )
dev.off()

save(p, file = paste(  featuresDir, "/RDataFiles/", allVars[j], "_summary_", "_", cell_lines[i], ".RData", sep ="" ))
}

}


registerDoParallel(cores=numberCores)

foreach( j = 1 : length(allVars), .packages = c("ggplot2", "grid" ), .export = 'theme_sharp') %dopar% {
  timePlotFun(myData_ss_f =myData_ss[myData_ss$variable==allVars[j], ], yMax= yMax)
  }

```


# for BiP
#myData_ss$cell_line[ myData_ss$cell_line == "BiP #12 "]<- "BiP" 
# myData_ss$variable[ myData_ss$variable == "Cytoplasm_Intensity_MeanIntensity_gfp_img "]<- "Cytoplasm_Intensity_MeanIntensity_img_gfp" 
# myData_ss$variable[ myData_ss$variable == "obj_cyto_only_Intensity_MeanIntensity_img_cyto "]<- "Cytoplasm_Intensity_MeanIntensity_img_gfp" 
# myData_ss$variable[ myData_ss$variable == "Cytoplasm_Intensity_MeanIntensity_gfp_img"]<- "Cytoplasm_Intensity_MeanIntensity_img_gfp" 
# myData_ss$variable[ myData_ss$variable == "obj_cyto_only_Intensity_MeanIntensity_img_cyto"]<- "Cytoplasm_Intensity_MeanIntensity_img_gfp" 
# 
# myData_ss$variable[ myData_ss$variable == "obj_nc_AreaShape_Area"]<- "Nuclei_AreaShape_Area" 
# myData_ss$variable[ myData_ss$variable == "obj_nc_AreaShape_Area"]<- "Nuclei_AreaShape_Area" 
# 
# myData_ss$variable[ myData_ss$variable == "Cytoplasm_Intensity_IntegratedIntensity_gfp_img"]<- "Cytoplasm_Intensity_IntegratedIntensity_img_gfp" 
# myData_ss$variable[ myData_ss$variable == "obj_cyto_only_Intensity_IntegratedIntensity_img_cyto"]<- "Cytoplasm_Intensity_IntegratedIntensity_img_gfp" 
# 
# myData_ss$variable[ myData_ss$variable == "obj_nc_Intensity_MeanIntensity_nc_image"]<- "Nuclei_Intensity_MeanIntensity_img_hoechst" 
# myData_ss$variable[ myData_ss$variable == "obj_nc_Intensity_MeanIntensity_img_nc"]<- "Nuclei_Intensity_MeanIntensity_img_hoechst" 


```{r}

myDFo <- read.table()


oscColName <- paste(divisionOne[[1]], divisionOne[[2]], sep ="_")

myList = list() # each entry data of certain location
head(myDFo)
# calculate oscillation parameters
if(oscillation){
  uniqueLocs <- unique(myDFo[ , locationID])
  locDataL =list() # list with all data of a certain location, first position in list will be the data.frame, other positions any relevant data needed for oscillation stuff
  
  
  for (i in seq_along(uniqueLocs) )
 {
  
    locDataL[[i]] <- myDFo[myDFo[ , locationID ] == uniqueLocs[i],  ]
    head(locDataL[[1]])
    locDataL[[i+1]] <- 
      
      
      myList[[i]] <- locDataL
      
    head(locData)
    #Structure: a list containing per entry all info regarding a location. This can be achieved by putting lists inside lists: list[[location1]][dataframe][info1]..[infon]
    
    
     <- ddply(locData, .(imageParentObjInd, trackObjectsLabel ), summarize, smoothed = function(x, time_name,oscColName ) {
      smooth.spline(x[,time_name], x[,oscColName])}, oscColName = oscColName, time_name = timeID) 
    
    
     locDataL[1]
    
    
    unique(locData$imageParentObjInd)
    
    head(locData)
  ?smooth
  
  } # end for loop per location
  head(myDFo)
  jj<- smooth.spline(nn)
 plot(jj$x,jj$y)
 ?smooth.spline
 plot()
  
  nn<-c(1,2,3,5,3,7,8,9,5,9,11,12,11,15)
  plot(1:length(nn),smooth(nn))
  plot(1:length(nn),nn)
myDFo$
  head(myDFo)
  test <- ddply(myDFo, .(imageParentObjInd,trackObjectsLabel), transform, myl = length(unique(Metadata_tp) ))
head(test)  
sum(test$myl >1)
dim(test)
dim(myDFo)  
head(test)
  }


```