Merge pull request #2 from nccaiteam/master

wonseok5762 · web-flow · commit 1a61cf68127e · 2020-08-03T12:50:12.000+09:00
readme
diff --git a/README.md b/README.md
@@ -1,29 +1,13 @@
 
-# mmg-nia Repository for NIA project    
-    
 ## How to use it 
 1. `$ git clone https://github.com/lunit-io/mmg-nia`  
 2. `$ pip install -r requirements.txt` 
 3. `$ cd data_preprocessing` and do data-preprocessing [here](https://github.com/lunit-io/mmg-model-nia/tree/master/data_preprocessing)
 4. To train and test 5-fold cross validation, use `$ sh test.sh $GPU_ID $PICKLE_PATH $DATA_ROOT` \
 e.g. `$ sh test.sh 0 data_preprocessing/db/shuffled_db.pkl /data/mmg/mg_nia`  
+    - If you want to use many GPUs, input multiple numbers: `sh test.sh 0,1,2,3, ...`
 5. `$ cat resnet34-5fold-result`  
 ```  
-compressed
-   threshold : 0.1
-         calculated accuracy is 0.8150886790885683
-         calculated specificity is 0.8227048930437848
-         calculated sensitivity is 0.7889675985264972
-   threshold : 0.15
-         calculated accuracy is 0.8451890694648247
-         calculated specificity is 0.878533964063642
-         calculated sensitivity is 0.7308764166673534
-   threshold : 0.2
-         calculated accuracy is 0.8608703452476536
-         calculated specificity is 0.9128091136592129
-         calculated sensitivity is 0.6827785378337696
-   calculated auc is 0.8896701982655968
-uncompressed
    threshold : 0.1
          calculated accuracy is 0.8144577092389047
          calculated specificity is 0.8112627121478205
@@ -40,21 +24,6 @@ uncompressed
 ```  
 6. `$ cat densenet121-5fold-result`  
 ```  
-compressed
-   threshold : 0.1
-         calculated accuracy is 0.8405651319250256
-         calculated specificity is 0.8526687249469763
-         calculated sensitivity is 0.7984347172444817
-   threshold : 0.15
-         calculated accuracy is 0.8595072953293281
-         calculated specificity is 0.888602910574434
-         calculated sensitivity is 0.7593899157536472
-   threshold : 0.2
-         calculated accuracy is 0.8708727262659541
-         calculated specificity is 0.9096908783863396
-         calculated sensitivity is 0.7374108776754932
-   calculated auc is 0.9031778432556026
-uncompressed
    threshold : 0.1
          calculated accuracy is 0.8379339405852875
          calculated specificity is 0.8391845548624011
diff --git a/data_preprocessing/README.md b/data_preprocessing/README.md
@@ -1,19 +1,77 @@
 # Prepare dataset
 
-## Download dataset
+## Library installation: GDCM
+```
+apt-get update
+
+apt-get install -y --no-install-recommends curl build-essential cmake swig
+
+curl -L -O https://sourceforge.net/projects/gdcm/files/gdcm%202.x/GDCM%202.8.9/gdcm-2.8.9.tar.gz
+
+tar -xzf gdcm-2.8.9.tar.gz
+
+mv gdcm-2.8.9 gdcm-src
+
+mkdir gdcm-build
+
+cd gdcm-build
+
+cmake -DGDCM_WRAP_PYTHON=ON \
+          -DGDCM_BUILD_SHARED_LIBS=ON \
+          -DCMAKE_INSTALL_PREFIX=/usr/local \
+          ../gdcm-src
+
+make -j$(nproc) && make install && \
+
+cd /usr/local/lib && cp _gdcmswig.so gdcmswig.py gdcm.py $(python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())")
+
+cd - 
+
+rm -rf gdcm-src gdcm-build gdcm-2.8.9.tar.gz
+```
+
+## Dataset
 
 We assume the data is downloaded in $DATA_ROOT
 
-Rearrange data by moving Compressed & Uncompressed data by following command,
+Store data in $DATA_ROOT/dcm by following command,
 
 ```
 cd $DATA_ROOT
 mkdir dcm
-mv Uncompressed/ dcm/.
-mv Compressed/ dcm/.
 ```
 
-## Data preprocess
+Data structure
+
+```
+$DATA_ROOT
+└── dcm
+    ├── 1
+    │   ├── LCC.dcm
+    │   ├── LMLO.dcm
+    │   ├── RCC.dcm
+    │   └── RMLO.dcm
+    ├── 2
+    │   ├── LCC.dcm
+    │   ├── LMLO.dcm
+    │   ├── RCC.dcm
+    │   └── RMLO.dcm
+...
+ 
+$DATA_ROOT
+└── annotation_result_1st
+    ├── Various time stamps…(e.g., 20191015082002_KST)
+    ├── \_\_results
+    │   ├── dst_json
+    │   |   ├── Various time stamps...
+    │   |   ├── 20190930172251_KST
+    │   │   |   ├── Success     
+    │   │   │   |   ├── Type_id.json (e.g. Cancer_00367.json)   
+...         
+```
+ 
+
+## Data preprocessing
 To train our model, we have to first convert dicom files to image files.
 Then, the information about the data(such as the path to image file or annotations) is gathered.