Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions recognition/s4627234_3710project/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@

# Using an U-NET to segment the ISIC dataset


## Author
Name: Jingming Dai

Student number: s4627234 / 46272346

This project was completed for COMP3710



## Description
This project uses Improved UNet to split the ISIC dataset with a minimum Dice similarity coefficient of 0.8 for all labels on the test set.

In image segmentation tasks, especially medical image segmentation, U-Net is undoubtedly one of the most successful methods. U-net uses encoder (down-sampling) and decoder (up-sampling) structural connections. This project applies a technique that uses the Dice loss to train the model. Compared with the common U-Net, this model has better segmentation effect and higher dice similarity coefficient.


## Data set description:
The ISIC package includes four folders, including Training, Testing, Validation and their Ground Truth files. After downloading the files, we can put them into the newly created data folder. The directory of the files needs to be arranged as follows.

* data
* data_ISIC
* ISIC-2017_Training_Data
* ISIC-2017_Training_Part1_GroundTruth
* ISIC-2017_Test_v2_Data
* ISIC-2017_Test_v2_Part1_GroundTruth
* ISIC-2017_Validation_Data
* ISIC-2017_Validation_Part1_GroundTruth

![image](./images/data_image_example.png)


## How it works:

### UNet:
The structure of U-Net is shown in the following figure, the left part can be regarded as an encoder (down), and the right part can be regarded as a decoder (up).

![image](./images/UNet.png)

The encoder has four submodules, each submodule contains two convolution layers, and each submodule is followed by a down-sampling layer implemented by MaxPool2D.

The decoder consists of four sub-modules, and the resolution is sequentially increased by up-sampling operations until it is consistent with the resolution of the input image. The decoder uses two 3x3 un-padded convolutions and . After four up-sampling, a final 1x1 convolution with a sigmoid activation function is applied.


### Improved_UNet :
The algorithm is a modified version of UNet created by Isensee and colleagues, and below is the improved UNet graph. The UNet() and UNet_imp() functions in modules.py build the UNet model before and after improvement. The model is built according to the following UNet architecture, but changing the output "softmax" to "sigmoid" at the end so the output is a mask of one channel.

![image](./images/model_imp.png)

#### Important parts of improved UNet:

All the convolution, context, element-wise-sum in encoding are integrated into the up_imp() function.

Part of upsampling, concatenate, and localization in decoding are integrated into the down_imp() function.

__Context module:__
Described by 2 convolution layers (all except the first one with stride 2) with a dropout layer (0.3) in between.

__Upsampling module:__
It consists of an upsampling layer and a convolution layer.

__Localization module:__
Completed by a 3x3 convolution and a 1x1 convolution.

Instance normalization and Leaky reLU are used throughout the architecture. The model is compiled with dice coefficient loss and dice coefficient.


### Dice Similarity Coefficient (DSC):
is an ensemble similarity measure function, usually used to calculate the similarity of two samples. It is used as the loss function and validation segmentation image in this model.


## Install:
```
git clone https://github.com/shakes76/PatternFlow.git
```


## Using:
!!! Before running it, you must set path_data to the path of the data folder in the dataset.py folder.

When using this data for the first time, you need to use dataset.load_dataset(data_reshape = True), this setting will convert all non-uniform data pattern sizes into 256*256 size patterns and save them in a new folder "data_Reshape" in the "data" folder ". The data format is as follows:

* data_Reshaped
* Train
* Train_GT
* Test
* Test_GT
* Val
* Val_GT

!!! Always remember to use (data_reshape = False) if reshaped data already exists

__run train.py to training and save the model (model save in the current folder):__
(The specific code needs to be changed according to the specific device environment)
```
python3.9 train.py
```

__run predict.py to load the model and predict mask with validation data:__
(The specific code needs to be changed according to the specific device environment)
```
python3.9 predict.py
```

## Testing and Conclusion:
Train and test data are use on training and testing the model.

Run for 10 epochs will get (this is a very low epochs value
, please select more in train.py if wants a better result):

### model DSC:

![image](./images/DSC.png)


### model loss:

![image](./images/loss.png)


Evaluating the model through the val dataset, we can get the DSC data:
tf.Tensor(0.83824617, shape=(), dtype=float32)

The image folder contains the first 50 original images, predict mask , and ground truth comparison. You can also change the output image number by changing the number_list in predict.py.

### Some example of images:
![image](./images/output0.png)
![image](./images/output5.png)
![image](./images/output10.png)
![image](./images/output13.png)
![image](./images/output37.png)



## Packages:
os, cv2, skimage, tensorflow-macos_version_2.9.2

SimpleITK_version_2.2.0
-- SimpleITK is a simplified interface to the Insight Toolkit (ITK) for image registration and segmentation
(http://simpleitk.org/)

numpy_version_1.23.1
-- NumPy is the fundamental package for array computing with Python.
(https://www.numpy.org)

pandas_version_1.4.4
-- Powerful data structures for data analysis, time series, and statistics
(https://pandas.pydata.org)

matplotlib_version_3.5.2
-- Python plotting package
(https://matplotlib.org)


## Reference

Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., & Maier-Hein, K. H. (2018, February 28). Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge. arXiv.org. Retrieved October 19, 2022, from https://arxiv.org/abs/1802.10508v1
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
168 changes: 168 additions & 0 deletions recognition/s4627234_3710project/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@

import os
import cv2 as cv
import SimpleITK as sitk
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

import skimage
from skimage import io

# ISIC data format using 2017 ISIC data

# Because original data from ISIC has multiple different size, so we
# need to reshape them into a better size for doing ML
# if data_reshape is True, then the program will create a new data folder in
# the given direction and reshape all the ISIC images into given size
# (.csv will still in the original position and will not change)

"""
Change the path to the path of the IRIS folder,
e.g. if the folder path is ./data/data_ISIC, use path_data = "./data"
"""
path_data = "/Users/davedai/Desktop/MySolution/data"

def create_data(data_from, data_images, data_to, img_size = 256):
""" Create data image based on the given data image to the given direction.
new data should have the given image size.

Args:
data_from (String): From direction
data_images (list): list of images
data_to (String): To direction
img_size (int, optional): image size to be transformed. Defaults to 256.
"""
for i in data_images:
img=sitk.ReadImage(os.path.join(data_from,i))
img_array=sitk.GetArrayFromImage(img)
new_array=cv.resize(img_array,(img_size,img_size))
data_name = i[:-4] # removing last four (.jpg/.png/...)

io.imsave(data_to + data_name + '.png', new_array)


def reshape_data():
"""
Reshape image into given size and save into the new file
"""
# training image path
train_path = path_data + '/data_ISIC/ISIC-2017_Training_Data/'
train = [fn for fn in os.listdir(train_path) if fn.endswith('jpg')]
train.sort()

# training ground truth path
train_path_gt = path_data + '/data_ISIC/ISIC-2017_Training_Part1_GroundTruth/'
train_gt = [fn for fn in os.listdir(train_path_gt) if fn.endswith('png')]
train_gt.sort()

# test image path
test_path = path_data + '/data_ISIC/ISIC-2017_Test_v2_Data'
test = [fn for fn in os.listdir(test_path) if fn.endswith('jpg')]
test.sort()

# test ground truth images
test_path_gt = path_data + '/data_ISIC/ISIC-2017_Test_v2_Part1_GroundTruth'
test_gt = [fn for fn in os.listdir(test_path_gt) if fn.endswith('png')]
test_gt.sort()

# validation image path
val_path = path_data + '/data_ISIC/ISIC-2017_Validation_Data'
val = [fn for fn in os.listdir(val_path) if fn.endswith('jpg')]
val.sort()

# validation image path
val_path_gt = path_data + '/data_ISIC/ISIC-2017_Validation_Part1_GroundTruth'
val_gt = [fn for fn in os.listdir(val_path_gt) if fn.endswith('png')]
val_gt.sort()

if not os.path.exists(path_data + '/data_Reshaped'):
os.mkdir(path_data + '/data_Reshaped/')
os.mkdir(path_data + '/data_Reshaped/Train')
os.mkdir(path_data + '/data_Reshaped/Train_GT')
os.mkdir(path_data + '/data_Reshaped/Test')
os.mkdir(path_data + '/data_Reshaped/Test_GT')
os.mkdir(path_data + '/data_Reshaped/Val')
os.mkdir(path_data + '/data_Reshaped/Val_GT')

create_data(train_path, train, (path_data + '/data_Reshaped/Train/'))
create_data(train_path_gt, train_gt, (path_data + '/data_Reshaped/Train_GT/'))
create_data(test_path, test, (path_data + '/data_Reshaped/Test/'))
create_data(test_path_gt, test_gt, (path_data + '/data_Reshaped/Test_GT/'))
create_data(val_path, val, (path_data + '/data_Reshaped/Val/'))
create_data(val_path_gt, val_gt, (path_data + '/data_Reshaped/Val_GT/'))


def load_data(csv, path_image, path_image_gt):
""" load the data and its mask from the given path with the order in csv file
csv file only use for getting the names of the images

Args:
csv (pandas.core.frame.DataFrame): csv file by using pandas to read
path_image (String): image path
path_image_gt (String): image ground truth path

Returns:
numpy.ndarray, numpy.ndarray: return numpy.ndarray of images and it's mask
"""
x, y = [], []
for _, i in csv.iterrows():
image = sitk.ReadImage(path_image + i[0]+'.png')
image_array_ = sitk.GetArrayFromImage(image)
image_array = image_array_/255.0
x.append(image_array)

mask_ = cv.imread(path_image_gt + i[0]+'_segmentation.png')
mask = mask_/255.0
y.append(mask)

return np.array(x), np.array(y)


def load_dataset(data_reshape = False):
""" Load the dataset, if the data need to reshape(data_reshape = True) then reshape the dataset

Args:
data_reshape (bool, optional): reshape the data if True. Defaults to False.

Returns:
numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray:
return the image and its mask image for all training and testing data
"""
if data_reshape:
reshape_data()

train_csv = pd.read_csv(path_data + '/data_ISIC/ISIC-2017_Training_Data/ISIC-2017_Training_Data_metadata.csv')
test_csv = pd.read_csv(path_data + '/data_ISIC/ISIC-2017_Test_v2_Data/ISIC-2017_Test_v2_Data_metadata.csv')

path_train = path_data + '/data_Reshaped/Train/'
path_train_gt = path_data + '/data_Reshaped/Train_GT/'

path_test = path_data + '/data_Reshaped/Test/'
path_test_gt = path_data + '/data_Reshaped/Test_GT/'

train_x, train_y = load_data(train_csv, path_train, path_train_gt)
test_x, test_y = load_data(test_csv, path_test, path_test_gt)

return train_x, train_y, test_x, test_y




def load_val():
""" Load the dataset

Returns:
numpy.ndarray, numpy.ndarray: return the image and its mask image for all val data
"""
val_csv = pd.read_csv(path_data + '/data_ISIC/ISIC-2017_Validation_Data/ISIC-2017_Validation_Data_metadata.csv')


path_val = path_data + '/data_Reshaped/Val/'
path_val_gt = path_data + '/data_Reshaped/Val_GT/'

val_x, val_y = load_data(val_csv, path_val, path_val_gt)

return val_x, val_y

Binary file added recognition/s4627234_3710project/images/DSC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added recognition/s4627234_3710project/images/UNet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added recognition/s4627234_3710project/images/loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading