diff --git a/recognition/45316207_VQ-VAE/.gitignore b/recognition/45316207_VQ-VAE/.gitignore new file mode 100644 index 0000000000..8d756d8377 --- /dev/null +++ b/recognition/45316207_VQ-VAE/.gitignore @@ -0,0 +1,7 @@ +# Ignore the dataset folder +keras_png_slices_data/ +vqvae_saved_model/ +pixelcnn_saved_model/ +codebook_indices.csv +.DS_Store +*/.DS_Store \ No newline at end of file diff --git a/recognition/45316207_VQ-VAE/README.md b/recognition/45316207_VQ-VAE/README.md new file mode 100644 index 0000000000..40bb59a8c7 --- /dev/null +++ b/recognition/45316207_VQ-VAE/README.md @@ -0,0 +1,96 @@ +# Synthetic Brain MRI Image Generation with VQ-VAE (COMP3710) + +by Alex Nicholson, 45316207 + +--- + +## Project Overview + +### The Algorithm and the Problem + +The algorithm implemented in this project is a [VQ-VAE](https://arxiv.org/abs/1711.00937) (Vector Quantised - Variational Auto-Encoder) model, which is an architecture that aims to encode data into a compressed format (embedding higher dimensional data into a lower dimenisional subspace) and then decode this compressed format to recreate the original image as closely as possible. For this project, we will be training the model on the OASIS brain MRI image datset so that we can use it to generate novel and realistic synthetic brain MRI images. + +### How it Works + +It works by transforming the image into a set of encoding vectors, using a CNN (convolutional neural network) encoder network, which are then quantised to fit the codebook vectors of the model. These quantised encodings are then passed to the decoder network which is made up of a transposed convolution (deconvolution) layers, which generated a synthetic reconstruction that is very similar to the original input image. This model is then trained until the VQVAE is very accurate at encoding the images into a condensed format while preserving the information held within. + +In addition to being able to reconstruct images, we also might want to generate novel brain images, so to do this we can also train a separate CNN, using the PixelCNN architechure that can generate brain images directly from samples of codebook vectors. + +### Goals + +The performance goals for this project are, generally, for the model to produce a “reasonably clear image” and also, more concretely, for the model to achieve an average structured similarity index (SSIM) of over 0.6. + +--- + +## Usage Guide + +### Installation + +1. Install Anaconda +2. Create a clean conda environment and activate it +3. Install all of the required packages (see dependancy list below) +4. Download the OASIS dataset from [this link](https://cloudstor.aarnet.edu.au/plus/s/tByzSZzvvVh0hZA/download) + +### Usage + +* Run `python train.py` to train the model +* Run `python predict.py` to test out the trained model + +### Dependancies + +The following dependancies were used in the project: + +* tensorflow (version 2.9.2) +* tensorflow_probability (version 0.17.0) +* numpy (version 1.23.3) +* matplotlib (version 3.5.1) +* PIL / pillow (version 9.1.0) +* imageio (version 2.22.1) +* skimage (version 0.19.3) + +--- + +## Methods + +The training, validation and testing splits of the data were used as provided in the original dataset, with these partitions taking up 85%, 10%, and 5% respectively (total 11,328 images in dataset), in line with good standard practice for dataset partitioning. The data pixel values of the images were normalise to be within -1 to 1 by dividing by 255 and subtracting 1 to avoid data biases. + +--- + +## Results + +### Example Generations + +Below are some examples of the generations made by the VQ VAE model after 20 epochs of training over the full OASIS training dataset. These generations were produced by putting real MRI image examples from the test set into the model and then getting the reconstructed output from the model. + +![alt text](./out/original_vs_reconstructed_0000.png) +![alt text](./out/original_vs_reconstructed_0001.png) +![alt text](./out/original_vs_reconstructed_0002.png) +![alt text](./out/original_vs_reconstructed_0003.png) +![alt text](./out/original_vs_reconstructed_0004.png) +![alt text](./out/original_vs_reconstructed_0005.png) + +### Generation Quality Over Time + +Below is an animation of the progression of the quality of the model's generations over the course of training. +![alt text](./out/vqvae_training_progression.gif) + +### Training Metrics + +The various loss metrics of the model were recorded throughout training to track its performance over time, these include: + +* Total Loss: What does the total loss represent??? +* Reconstruction Loss: What does the reconstruction loss represent??? +* VQ VAE Loss: What does the VQ VAE loss represent??? + +These losses are plotted over the course of the models training in both standard and log scales below: +![alt text](./out/training_loss_curves.png) + +Model Log Loss Progress Throughout Training: +![alt text](./out/training_logloss_curves.png) + +In addition to statistical losses, a more real world metric to track the quality of our generations over time is to compare the similarity of the reconstructed output images it produces with the original input image they are created from. This similarity can be measured by the SSIM (Structured Similarity Index). At the end of each epoch, the SSIM was computed for 10 randomly selected images from the test dataset, and the average was recorded. This average SSIM can be seen plotted over time below: +![alt text](./training_ssim_curve.png) + +--- + +Made with ❤️ diff --git a/recognition/45316207_VQ-VAE/dataset.py b/recognition/45316207_VQ-VAE/dataset.py new file mode 100644 index 0000000000..663ef0b7aa --- /dev/null +++ b/recognition/45316207_VQ-VAE/dataset.py @@ -0,0 +1,108 @@ +""" +dataset.py + +Alex Nicholson (45316207) +11/10/2022 + +Contains the data loader for loading and preprocessing the OASIS data + +""" + + +import glob +import numpy as np +import PIL +import os + +import matplotlib.pyplot as plt + + +def load_dataset(max_images=None, verbose=False): + """ + Loads the OASIS dataset of brain MRI images + + Parameters: + (optional) max_images (int): The maximum number of images of the dataset to be used (default=None) + (optional) verbose (bool): Whether a description of the dataset should be printed after it has loaded + + Returns: + train_data_scaled (ndarray): Numpy array of scaled image data for training (9,664 images max) + test_dat_scaleda (ndarray): Numpy array of scaled image data testing (1,120 images max) + validate_data_scaled (ndarray): Numpy array of scaled image data validation (544 images max) + data_variance (int): Variance of the test dataset + """ + + print("Loading dataset...") + + # File paths + images_path = "keras_png_slices_data/" + test_path = images_path + "keras_png_slices_test/" + train_path = images_path + "keras_png_slices_train/" + validate_path = images_path + "keras_png_slices_validate/" + dataset_paths = [test_path, train_path, validate_path] + + # Set up the lists we will load our data into + test_data = [] + train_data = [] + validate_data = [] + datasets = [test_data, train_data, validate_data] + + # Load all the images into numpy arrays + for i in range(0, len(dataset_paths)): + # Get all the png files in this dataset_path directory + images_list = glob.glob(os.path.join(dataset_paths[i], "*.png")) + + images_collected = 0 + for img_filename in images_list: + # Stop loading in images if we hit out max image limit + if max_images and images_collected >= max_images: + break + + # Open the image + img = PIL.Image.open(img_filename) + # Convert image to numpy array + data = np.asarray(img) + datasets[i].append(data) + + # Close the image (not strictly necessary) + del img + images_collected = images_collected + 1 + + # Convert the datasets into numpy arrays + train_data = np.array(train_data) + test_data = np.array(test_data) + validate_data = np.array(validate_data) + + # Preprocess the data + train_data = np.expand_dims(train_data, -1) + test_data = np.expand_dims(test_data, -1) + validate_data = np.expand_dims(validate_data, -1) + # Scale the data into values between -0.5 and 0.5 (range of 1 centred about 0) + train_data_scaled = (train_data / 255.0) - 0.5 + test_data_scaled = (test_data / 255.0) - 0.5 + validate_data_scaled = (validate_data / 255.0) - 0.5 + + # Get the dataset variance + data_variance = np.var(train_data / 255.0) + + if verbose == True: + # Debug dataset loading + print(f"###train_data ({type(train_data)}): {np.shape(train_data)}###") + print(f"###test_data ({type(test_data)}): {np.shape(test_data)}###") + print(f"###train_data_scaled ({type(train_data_scaled)}): {np.shape(train_data_scaled)}###") + print(f"###test_data_scaled ({type(test_data_scaled)}): {np.shape(test_data_scaled)}###") + print(f"###data_variance ({type(data_variance)}): {data_variance}###") + print('') + + print(f"###validate_data ({type(validate_data)}): {np.shape(validate_data)}###") + print(f"###validate_data_scaled ({type(validate_data_scaled)}): {np.shape(validate_data_scaled)}###") + + print('') + print('') + + return (train_data_scaled, validate_data_scaled, test_data_scaled, data_variance) + + +if __name__ == "__main__": + # Run a test + load_dataset(max_images=1000) \ No newline at end of file diff --git a/recognition/45316207_VQ-VAE/modules.py b/recognition/45316207_VQ-VAE/modules.py new file mode 100644 index 0000000000..105ec6c31a --- /dev/null +++ b/recognition/45316207_VQ-VAE/modules.py @@ -0,0 +1,245 @@ +""" +modules.py + +Alex Nicholson (45316207) +11/10/2022 + +Contains all the source code of the components of the VQ-VAE model and Pixel CNN model as well as a function to build and return the full model. The model and implementation are based off of the Keras VQ VAE tutorial and the original VQ VAE paper ('Neural Discrete Representation Learning'). + +""" + +import numpy as np +import tensorflow as tf +from tensorflow import keras +import tensorflow_probability as tfp + + +class CustomVectorQuantizer(keras.layers.Layer): + """ + A Custom Vector Quantising Layer + + Attributes: + name (str): first name of the person + surname (str): family name of the person + age (int): age of the person + + Methods: + call(x): Calls the CustomVectorQuantizer to quantise the input vector x??? + get_code_indices(flattened_inputs): Gets the indices of the codebook vectors??? + """ + + def __init__(self, num_embeddings, embedding_dim, beta=0.25, **kwargs): + super().__init__(**kwargs) + self.embedding_dim = embedding_dim + self.num_embeddings = num_embeddings + + # The `beta` parameter is best kept between [0.25, 2] as per the paper. + self.beta = beta + + # Initialize the embeddings which we will quantize. + w_init = tf.random_uniform_initializer() + self.embeddings = tf.Variable( + initial_value=w_init( + shape=(self.embedding_dim, self.num_embeddings), dtype="float32" + ), + trainable=True, + name="embeddings_vqvae", + ) + + def call(self, x): + # Calculate the input shape of the inputs and + # then flatten the inputs keeping `embedding_dim` intact. + input_shape = tf.shape(x) + flattened = tf.reshape(x, [-1, self.embedding_dim]) + + # Quantization. + encoding_indices = self.get_code_indices(flattened) + encodings = tf.one_hot(encoding_indices, self.num_embeddings) + quantized = tf.matmul(encodings, self.embeddings, transpose_b=True) + + # Reshape the quantized values back to the original input shape + quantized = tf.reshape(quantized, input_shape) + + # Calculate vector quantization loss and add that to the layer. You can learn more + # about adding losses to different layers here: + # https://keras.io/guides/making_new_layers_and_models_via_subclassing/. Check + # the original paper to get a handle on the formulation of the loss function. + commitment_loss = tf.reduce_mean((tf.stop_gradient(quantized) - x) ** 2) + codebook_loss = tf.reduce_mean((quantized - tf.stop_gradient(x)) ** 2) + self.add_loss(self.beta * commitment_loss + codebook_loss) + + # Straight-through estimator. + quantized = x + tf.stop_gradient(quantized - x) + return quantized + + def get_code_indices(self, flattened_inputs): + # Calculate L2-normalized distance between the inputs and the codes. + similarity = tf.matmul(flattened_inputs, self.embeddings) + distances = ( + tf.reduce_sum(flattened_inputs ** 2, axis=1, keepdims=True) + + tf.reduce_sum(self.embeddings ** 2, axis=0) + - 2 * similarity + ) + + # Derive the indices for minimum distances. + encoding_indices = tf.argmin(distances, axis=1) + return encoding_indices + + +def get_encoder(latent_dim=16): + """ + Encoder Module + + Parameters: + (optional) latent_dim (int): The number of latent dimensions the images are compressed down to (default=16) + + Returns: + encoder (Keras Model): The encoder module for the VQ-VAE + """ + + encoder_inputs = keras.Input(shape=(256, 256, 1)) + x = keras.layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")( + encoder_inputs + ) + x = keras.layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x) + encoder_outputs = keras.layers.Conv2D(latent_dim, 1, padding="same")(x) + return keras.Model(encoder_inputs, encoder_outputs, name="encoder") + + +def get_decoder(latent_dim=16): + """ + Decoder Module + + Parameters: + (optional) latent_dim (int): The number of latent dimensions the images are compressed down to (default=16) + + Returns: + decoder (Keras Model): The decoder module for the VQ-VAE + """ + + latent_inputs = keras.Input(shape=get_encoder(latent_dim).output.shape[1:]) + x = keras.layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")( + latent_inputs + ) + x = keras.layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x) + decoder_outputs = keras.layers.Conv2DTranspose(1, 3, padding="same")(x) + return keras.Model(latent_inputs, decoder_outputs, name="decoder") + + +def get_vqvae(latent_dim=16, num_embeddings=64): + """ + Builds the complete VQ-VAE Model out of its component modules + + Parameters: + (optional) latent_dim (int): The number of latent dimensions the images are compressed down to (default=16) + (optional) num_embeddings (int): The number of codebook vectors in the embedding space (default=64) + + Returns: + vq_vae_model (Keras Model): A complete VQ-VAE model + """ + + print("Building model...") + + vq_layer = CustomVectorQuantizer(num_embeddings, latent_dim, name="vector_quantizer") + encoder = get_encoder(latent_dim) + decoder = get_decoder(latent_dim) + inputs = keras.Input(shape=(256, 256, 1)) + encoder_outputs = encoder(inputs) + quantized_latents = vq_layer(encoder_outputs) + reconstructions = decoder(quantized_latents) + return keras.Model(inputs, reconstructions, name="vq_vae") + + + + +# TODO: Document these classes and functions for pixelcnn + +# The first layer is the PixelCNN layer. This layer simply +# builds on the 2D convolutional layer, but includes masking. +class PixelConvLayer(keras.layers.Layer): + def __init__(self, mask_type, **kwargs): + super(PixelConvLayer, self).__init__() + self.mask_type = mask_type + self.conv = keras.layers.Conv2D(**kwargs) + + def build(self, input_shape): + # Build the conv2d layer to initialize kernel variables + self.conv.build(input_shape) + # Use the initialized kernel to create the mask + kernel_shape = self.conv.kernel.get_shape() + self.mask = np.zeros(shape=kernel_shape) + self.mask[: kernel_shape[0] // 2, ...] = 1.0 + self.mask[kernel_shape[0] // 2, : kernel_shape[1] // 2, ...] = 1.0 + if self.mask_type == "B": + self.mask[kernel_shape[0] // 2, kernel_shape[1] // 2, ...] = 1.0 + + def call(self, inputs): + self.conv.kernel.assign(self.conv.kernel * self.mask) + return self.conv(inputs) + + +# Next, we build our residual block layer. +# This is just a normal residual block, but based on the PixelConvLayer. +class ResidualBlock(keras.layers.Layer): + def __init__(self, filters, **kwargs): + super(ResidualBlock, self).__init__(**kwargs) + self.conv1 = keras.layers.Conv2D( + filters=filters, kernel_size=1, activation="relu" + ) + self.pixel_conv = PixelConvLayer( + mask_type="B", + filters=filters // 2, + kernel_size=3, + activation="relu", + padding="same", + ) + self.conv2 = keras.layers.Conv2D( + filters=filters, kernel_size=1, activation="relu" + ) + + def call(self, inputs): + x = self.conv1(inputs) + x = self.pixel_conv(x) + x = self.conv2(x) + return keras.layers.add([inputs, x]) + + +def get_pixel_cnn(vqvae_model, pixelcnn_input_shape, num_embeddings, num_residual_blocks, num_pixelcnn_layers): + print(f"Input shape of the PixelCNN: {pixelcnn_input_shape}") + + pixelcnn_inputs = keras.Input(shape=pixelcnn_input_shape, dtype=tf.int32) + ohe = tf.one_hot(pixelcnn_inputs, num_embeddings) + x = PixelConvLayer( + mask_type="A", filters=128, kernel_size=7, activation="relu", padding="same" + )(ohe) + + for _ in range(num_residual_blocks): + x = ResidualBlock(filters=128)(x) + + for _ in range(num_pixelcnn_layers): + x = PixelConvLayer( + mask_type="B", + filters=128, + kernel_size=1, + strides=1, + activation="relu", + padding="valid", + )(x) + + out = keras.layers.Conv2D( + filters=num_embeddings, kernel_size=1, strides=1, padding="valid" + )(x) + + pixel_cnn = keras.Model(pixelcnn_inputs, out, name="pixel_cnn") + pixel_cnn.summary() + + return pixel_cnn + + + +if __name__ == "__main__": + vqvae_model = get_vqvae() + vqvae_model.summary() + + # pixel_cnn_model = get_pixel_cnn() + # pixel_cnn_model.summary() \ No newline at end of file diff --git a/recognition/45316207_VQ-VAE/out/.DS_Store b/recognition/45316207_VQ-VAE/out/.DS_Store new file mode 100644 index 0000000000..5008ddfcf5 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/.DS_Store differ diff --git a/recognition/45316207_VQ-VAE/out/code_0000.png b/recognition/45316207_VQ-VAE/out/code_0000.png new file mode 100644 index 0000000000..1238a1ed46 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0000.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0001.png b/recognition/45316207_VQ-VAE/out/code_0001.png new file mode 100644 index 0000000000..c3017c8292 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0001.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0002.png b/recognition/45316207_VQ-VAE/out/code_0002.png new file mode 100644 index 0000000000..e13172820b Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0002.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0003.png b/recognition/45316207_VQ-VAE/out/code_0003.png new file mode 100644 index 0000000000..984abd7cf9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0003.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0004.png b/recognition/45316207_VQ-VAE/out/code_0004.png new file mode 100644 index 0000000000..e92614343e Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0004.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0005.png b/recognition/45316207_VQ-VAE/out/code_0005.png new file mode 100644 index 0000000000..5fdb842adc Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0005.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0006.png b/recognition/45316207_VQ-VAE/out/code_0006.png new file mode 100644 index 0000000000..c62ba5eadb Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0006.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0007.png b/recognition/45316207_VQ-VAE/out/code_0007.png new file mode 100644 index 0000000000..ed9692077f Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0007.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0008.png b/recognition/45316207_VQ-VAE/out/code_0008.png new file mode 100644 index 0000000000..5f97fca1fd Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0008.png differ diff --git a/recognition/45316207_VQ-VAE/out/code_0009.png b/recognition/45316207_VQ-VAE/out/code_0009.png new file mode 100644 index 0000000000..c55d2ff7dd Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/code_0009.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0001.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0001.png new file mode 100644 index 0000000000..18a65e4d94 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0001.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0002.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0002.png new file mode 100644 index 0000000000..277c839a42 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0002.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0003.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0003.png new file mode 100644 index 0000000000..1a94a83d2f Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0003.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0004.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0004.png new file mode 100644 index 0000000000..4128f28035 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0004.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0005.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0005.png new file mode 100644 index 0000000000..a00628ead9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0005.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0006.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0006.png new file mode 100644 index 0000000000..fba25cd213 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0006.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0007.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0007.png new file mode 100644 index 0000000000..3facf1722a Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0007.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0008.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0008.png new file mode 100644 index 0000000000..c222a61dee Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0008.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0009.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0009.png new file mode 100644 index 0000000000..2d42dbf4b8 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0009.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0010.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0010.png new file mode 100644 index 0000000000..c6971bba31 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0010.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0011.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0011.png new file mode 100644 index 0000000000..98d1583ac6 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0011.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0012.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0012.png new file mode 100644 index 0000000000..4ad31b3c37 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0012.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0013.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0013.png new file mode 100644 index 0000000000..4457310b1d Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0013.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0014.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0014.png new file mode 100644 index 0000000000..b478107a18 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0014.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0015.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0015.png new file mode 100644 index 0000000000..4a89f89529 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0015.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0016.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0016.png new file mode 100644 index 0000000000..d7eac5a056 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0016.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0017.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0017.png new file mode 100644 index 0000000000..46b056b7fa Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0017.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0018.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0018.png new file mode 100644 index 0000000000..b7d76907b7 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0018.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0019.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0019.png new file mode 100644 index 0000000000..272e61b809 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0019.png differ diff --git a/recognition/45316207_VQ-VAE/out/image_at_epoch_0020.png b/recognition/45316207_VQ-VAE/out/image_at_epoch_0020.png new file mode 100644 index 0000000000..2ac2cb3ace Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/image_at_epoch_0020.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0000.png b/recognition/45316207_VQ-VAE/out/novel_generation_0000.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0000.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0001.png b/recognition/45316207_VQ-VAE/out/novel_generation_0001.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0001.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0002.png b/recognition/45316207_VQ-VAE/out/novel_generation_0002.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0002.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0003.png b/recognition/45316207_VQ-VAE/out/novel_generation_0003.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0003.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0004.png b/recognition/45316207_VQ-VAE/out/novel_generation_0004.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0004.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0005.png b/recognition/45316207_VQ-VAE/out/novel_generation_0005.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0005.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0006.png b/recognition/45316207_VQ-VAE/out/novel_generation_0006.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0006.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0007.png b/recognition/45316207_VQ-VAE/out/novel_generation_0007.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0007.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0008.png b/recognition/45316207_VQ-VAE/out/novel_generation_0008.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0008.png differ diff --git a/recognition/45316207_VQ-VAE/out/novel_generation_0009.png b/recognition/45316207_VQ-VAE/out/novel_generation_0009.png new file mode 100644 index 0000000000..a5140f7ff9 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/novel_generation_0009.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0000.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0000.png new file mode 100644 index 0000000000..563f37f545 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0000.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0001.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0001.png new file mode 100644 index 0000000000..563f37f545 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0001.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0002.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0002.png new file mode 100644 index 0000000000..bc30d319cf Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0002.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0003.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0003.png new file mode 100644 index 0000000000..2b9e7e2ce1 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0003.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0004.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0004.png new file mode 100644 index 0000000000..b38e4908d6 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0004.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0005.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0005.png new file mode 100644 index 0000000000..53258f482b Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0005.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0006.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0006.png new file mode 100644 index 0000000000..d21e2b5c08 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0006.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0007.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0007.png new file mode 100644 index 0000000000..2720c60359 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0007.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0008.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0008.png new file mode 100644 index 0000000000..1693d42b9d Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0008.png differ diff --git a/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0009.png b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0009.png new file mode 100644 index 0000000000..08775a0831 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/original_vs_reconstructed_0009.png differ diff --git a/recognition/45316207_VQ-VAE/out/training_logloss_curves.png b/recognition/45316207_VQ-VAE/out/training_logloss_curves.png new file mode 100644 index 0000000000..1e4d85ff74 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/training_logloss_curves.png differ diff --git a/recognition/45316207_VQ-VAE/out/training_loss_curves.png b/recognition/45316207_VQ-VAE/out/training_loss_curves.png new file mode 100644 index 0000000000..c293ee2b1d Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/training_loss_curves.png differ diff --git a/recognition/45316207_VQ-VAE/out/training_ssim_curve.png b/recognition/45316207_VQ-VAE/out/training_ssim_curve.png new file mode 100644 index 0000000000..9f118d2660 Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/training_ssim_curve.png differ diff --git a/recognition/45316207_VQ-VAE/out/vqvae_training_progression.gif b/recognition/45316207_VQ-VAE/out/vqvae_training_progression.gif new file mode 100644 index 0000000000..2127bf102a Binary files /dev/null and b/recognition/45316207_VQ-VAE/out/vqvae_training_progression.gif differ diff --git a/recognition/45316207_VQ-VAE/predict.py b/recognition/45316207_VQ-VAE/predict.py new file mode 100644 index 0000000000..a28ee4c03f --- /dev/null +++ b/recognition/45316207_VQ-VAE/predict.py @@ -0,0 +1,55 @@ +""" +predict.py + +Alex Nicholson (45316207) +11/10/2022 + +Shows example usage of the trained model with visualisations of it's output results + +""" + + +import dataset +import utils +import modules +from tensorflow import keras + + +if __name__ == "__main__": + # ---------------------------------------------------------------------------- # + # LOAD DATA # + # ---------------------------------------------------------------------------- # + # Import data loader from dataset.py + (train_data, validate_data, test_data, data_variance) = dataset.load_dataset(max_images=None, verbose=True) + + + # ---------------------------------------------------------------------------- # + # IMPORT TRAINED VQVAE MODEL # + # ---------------------------------------------------------------------------- # + # Import trained and saved vqvae model from file + trained_vqvae_model = keras.models.load_model("./vqvae_saved_model", custom_objects={'VectorQuantizer': modules.CustomVectorQuantizer}) + + + # ---------------------------------------------------------------------------- # + # IMPORT TRAINED PIXELCNN MODEL # + # ---------------------------------------------------------------------------- # + # Import trained and saved pixelcnn model from file + trained_pixelcnn_model = keras.models.load_model("./pixelcnn_saved_model") + + + # ---------------------------------------------------------------------------- # + # FINAL RESULTS # + # ---------------------------------------------------------------------------- # + + + examples_to_show = 10 + + # # Visualise the final results and calculate the structural similarity index (SSIM) + utils.show_reconstruction_examples(trained_vqvae_model, test_data, examples_to_show) + + # # Visualise the discrete codes + utils.visualise_codes(trained_vqvae_model, test_data, examples_to_show) + + # # Visualise novel generations from codes + num_embeddings = 128 + utils.visualise_codebook_sampling(trained_vqvae_model, trained_pixelcnn_model, train_data, num_embeddings, examples_to_show) \ No newline at end of file diff --git a/recognition/45316207_VQ-VAE/train.py b/recognition/45316207_VQ-VAE/train.py new file mode 100644 index 0000000000..c5df446beb --- /dev/null +++ b/recognition/45316207_VQ-VAE/train.py @@ -0,0 +1,345 @@ +""" +train.py + +Alex Nicholson (45316207) +11/10/2022 + +Contains the source code for training, validating, testing and saving your model. The model is imported from “modules.py” and the data loader is imported from “dataset.py”. Losses and metrics are plotted throughout training. + +""" + + +import dataset +import modules +import utils +from tensorflow import keras +import tensorflow as tf +import numpy as np +import matplotlib.pyplot as plt +import glob +import imageio.v2 as imageio +import os as os + + +class VQVAETrainer(keras.models.Model): + """ + A Custom Training Loop for the VQ-VAE Model + + Attributes: + train_variance (ndarray): The input data for the model (input data in the form of variances?) ??? + latent_dim (int): The number of latent dimensions the images are compressed down to (default=32) + num_embeddings (int): The number of codebook vectors in the embedding space (default=128) + vqvae (Keras Model): The custom VQ-VAE model + total_loss_tracker (Keras Metric): A tracker for the total loss performance of the model during training??? + reconstruction_loss_tracker (Keras Metric): A tracker for the reconstruction loss performance of the model during training??? + vqvae_loss_tracker (Keras Metric): A tracker for the VQ loss performance of the model during training??? + + Methods: + metrics(): Returns a list of metrics for the total_loss, reconstruction_loss, and vqvae_loss of the model + train_step(x): Trains the model for a single step using the given training sample/samples x??? + """ + + def __init__(self, train_variance, latent_dim=32, num_embeddings=128, **kwargs): + super(VQVAETrainer, self).__init__(**kwargs) + self.train_variance = train_variance + self.latent_dim = latent_dim + self.num_embeddings = num_embeddings + + self.vqvae = modules.get_vqvae(self.latent_dim, self.num_embeddings) + + self.total_loss_tracker = keras.metrics.Mean(name="total_loss") + self.reconstruction_loss_tracker = keras.metrics.Mean( + name="reconstruction_loss" + ) + self.vqvae_loss_tracker = keras.metrics.Mean(name="vqvae_loss") + self.ssim_history = [] + + @property + def metrics(self): + """ + Gets a list of metrics for current total_loss, reconstruction_loss, and vqvae_loss of the model + + Returns: + A list of metrics for total_loss, reconstruction_loss, and vqvae_loss + """ + + return [ + self.total_loss_tracker, + self.reconstruction_loss_tracker, + self.vqvae_loss_tracker, + ] + + def train_step(self, x): + """ + Trains the model for a single step using the given training sample/samples x??? + + Parameters: + x (Tensorflow Tensor???): The input training sample/samples (how big is a training step? how many samples?) ??? + + Returns: + A dictionary of the model's training metrics with keys: "loss", "reconstruction_loss", and "vqvae_loss" + """ + + with tf.GradientTape() as tape: + # Outputs from the VQ-VAE. + reconstructions = self.vqvae(x) + + # Calculate the losses. + reconstruction_loss = ( + tf.reduce_mean((x - reconstructions) ** 2) / self.train_variance + ) + total_loss = reconstruction_loss + sum(self.vqvae.losses) + + # Backpropagation. + grads = tape.gradient(total_loss, self.vqvae.trainable_variables) + self.optimizer.apply_gradients(zip(grads, self.vqvae.trainable_variables)) + + # Loss tracking. + self.total_loss_tracker.update_state(total_loss) + self.reconstruction_loss_tracker.update_state(reconstruction_loss) + self.vqvae_loss_tracker.update_state(sum(self.vqvae.losses)) + + # Log results. + return { + "loss": self.total_loss_tracker.result(), + "reconstruction_loss": self.reconstruction_loss_tracker.result(), + "vqvae_loss": self.vqvae_loss_tracker.result(), + } + + +class ProgressImagesCallback(keras.callbacks.Callback): + """ + A custom callback for saving training progeress images + """ + + def __init__(self, train_data, validate_data): + self.train_data = train_data + self.validate_data = validate_data + + def save_progress_image(self, epoch): + """ + Saves progress images as we go throughout training + + Parameters: + epoch (int): The current training epoch + """ + + num_examples_to_generate = 16 + idx = np.random.choice(len(self.train_data), num_examples_to_generate) + test_images = self.train_data[idx] + reconstructions_test = self.model.vqvae.predict(test_images) + + fig = plt.figure(figsize=(16, 16)) + for i in range(reconstructions_test.shape[0]): + plt.subplot(4, 4, i + 1) + plt.imshow(reconstructions_test[i, :, :, 0], cmap='gray') + plt.axis('off') + + plt.savefig('out/image_at_epoch_{:04d}.png'.format(epoch+1)) + plt.close() + + def create_gif(self): + """ + Show an animated gif of the progress throughout training + """ + + anim_file = 'out/vqvae_training_progression.gif' + + with imageio.get_writer(anim_file, mode='I') as writer: + filenames = glob.glob('out/image*.png') + filenames = sorted(filenames) + for filename in filenames: + image = imageio.imread(filename) + writer.append_data(image) + image = imageio.imread(filename) + writer.append_data(image) + + def on_epoch_end(self, epoch, logs=None): + self.save_progress_image(epoch) + + similarity = utils.get_model_ssim(self.model.vqvae, self.validate_data) + self.model.ssim_history.append(similarity) + print(f"ssim: {similarity}") + + def on_train_end(self, logs=None): + self.create_gif() + + + +def train_vqvae(): + # ---------------------------------------------------------------------------- # + # HYPERPARAMETERS # + # ---------------------------------------------------------------------------- # + NUM_TRAINING_EXAMPLES = None + + TRAINING_EPOCHS = 20 + BATCH_SIZE = 128 + + NUM_LATENT_DIMS = 16 + NUM_EMBEDDINGS = 128 + + EXAMPLES_TO_SHOW = 10 + + + # ---------------------------------------------------------------------------- # + # LOAD DATA # + # ---------------------------------------------------------------------------- # + # Import data loader from dataset.py + (train_data, validate_data, test_data, data_variance) = dataset.load_dataset(max_images=NUM_TRAINING_EXAMPLES, verbose=True) + + + # ---------------------------------------------------------------------------- # + # BUILD MODEL # + # ---------------------------------------------------------------------------- # + # Create the model (wrapped in the training class to handle performance metrics logging) + vqvae_trainer = VQVAETrainer(data_variance, latent_dim=NUM_LATENT_DIMS, num_embeddings=NUM_EMBEDDINGS) + vqvae_trainer.compile(optimizer=keras.optimizers.Adam()) + + vqvae_trainer.vqvae.summary() + + # ---------------------------------------------------------------------------- # + # RUN TRAINING # + # ---------------------------------------------------------------------------- # + print("Training model...") + # Run training, plotting losses and metrics throughout + history = vqvae_trainer.fit(train_data, epochs=TRAINING_EPOCHS, batch_size=BATCH_SIZE, callbacks=[ProgressImagesCallback(train_data, validate_data)]) + + + # ---------------------------------------------------------------------------- # + # SAVE THE MODEL # + # ---------------------------------------------------------------------------- # + # Get the trained model + trained_vqvae_model = vqvae_trainer.vqvae + + # Save the model to file as a tensorflow SavedModel + trained_vqvae_model.save("./vqvae_saved_model") + + + # ---------------------------------------------------------------------------- # + # FINAL RESULTS # + # ---------------------------------------------------------------------------- # + # Visualise the model training curves + utils.plot_training_metrics(history) + utils.plot_ssim_history(vqvae_trainer.ssim_history) + + # Visualise output generations from the finished model + # utils.show_reconstruction_examples(trained_vqvae_model, validate_data, EXAMPLES_TO_SHOW) + + + + +def train_pixelcnn(): + # ---------------------------------------------------------------------------- # + # HYPERPARAMETERS # + # ---------------------------------------------------------------------------- # + # EXAMPLES_TO_SHOW = 10 + + NUM_EMBEDDINGS = 128 + NUM_RESIDUAL_BLOCKS = 2 + NUM_PIXELCNN_LAYERS = 2 + + BATCH_SIZE = 128 + NUM_EPOCHS = 60 + VALIDATION_SPLIT = 0.1 + + reuse_codebook_indices = True + continue_training = False + + # ---------------------------------------------------------------------------- # + # LOAD DATA # + # ---------------------------------------------------------------------------- # + # Import data loader from dataset.py + (train_data, validate_data, test_data, data_variance) = dataset.load_dataset(max_images=3000, verbose=True) + + + # ---------------------------------------------------------------------------- # + # IMPORT TRAINED VQVAE MODEL # + # ---------------------------------------------------------------------------- # + # Import trained and saved model from file + trained_vqvae_model = keras.models.load_model("./vqvae_saved_model") + + # ---------------------------------------------------------------------------- # + # GENERATE TRAINING DATA FOR PIXEL CNN # + # ---------------------------------------------------------------------------- # + print("Generating pixelcnn training data...") + # Generate the codebook indices. + encoded_outputs = trained_vqvae_model.get_layer("encoder").predict(train_data) + flat_enc_outputs = encoded_outputs.reshape(-1, encoded_outputs.shape[-1]) + + if reuse_codebook_indices == True and os.path.exists('./codebook_indices.csv'): + print("Loading pre-computed codebook indices from file") + # Pull the codebook indices from file + codebook_indices = np.loadtxt('./codebook_indices.csv', delimiter=',') + else: + print("Calculating codebook indices") + # Calculate the codebook indices from scratch + codebook_indices = utils.get_code_indices_savedmodel(trained_vqvae_model.get_layer("vector_quantizer"), flat_enc_outputs) + np.savetxt('./codebook_indices.csv', codebook_indices, delimiter=',') + + print("D") + + codebook_indices = codebook_indices.reshape(encoded_outputs.shape[:-1]) + print(f"Shape of the training data for PixelCNN: {codebook_indices.shape}") + + + # ---------------------------------------------------------------------------- # + # BUILD MODEL # + # ---------------------------------------------------------------------------- # + print("Building model...") + + if continue_training: + # Continue the training of an aoldready part trained model + pixel_cnn = keras.models.load_model("./pixelcnn_saved_model") + else: + # Start training from scratch + pixelcnn_input_shape = trained_vqvae_model.get_layer("encoder").predict(train_data).shape[1:-1] + pixel_cnn = modules.get_pixel_cnn(trained_vqvae_model, pixelcnn_input_shape, NUM_EMBEDDINGS, NUM_RESIDUAL_BLOCKS, NUM_PIXELCNN_LAYERS) + + + pixel_cnn.compile( + optimizer=keras.optimizers.Adam(3e-4), + loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), + metrics=["accuracy"], + ) + + pixel_cnn.summary() + + # ---------------------------------------------------------------------------- # + # RUN TRAINING # + # ---------------------------------------------------------------------------- # + print("Training model...") + pixel_cnn.fit( + x=codebook_indices, + y=codebook_indices, + batch_size=BATCH_SIZE, + epochs=NUM_EPOCHS, + validation_split=VALIDATION_SPLIT, + ) + + + # ---------------------------------------------------------------------------- # + # SAVE THE MODEL # + # ---------------------------------------------------------------------------- # + # Get the trained model + trained_pixelcnn_model = pixel_cnn + + # Save the model to file as a tensorflow SavedModel + trained_pixelcnn_model.save("./pixelcnn_saved_model") + + # ---------------------------------------------------------------------------- # + # FINAL RESULTS # + # ---------------------------------------------------------------------------- # + # Visualise the discrete codes + examples_to_show = 10 + utils.visualise_codes(trained_vqvae_model, test_data, examples_to_show) + + # # Visualise novel generations from codes + num_embeddings = 128 + utils.visualise_codebook_sampling(trained_vqvae_model, pixel_cnn, train_data, num_embeddings, examples_to_show) + + + + +if __name__ == "__main__": + train_vqvae() + train_pixelcnn() diff --git a/recognition/45316207_VQ-VAE/utils.py b/recognition/45316207_VQ-VAE/utils.py new file mode 100644 index 0000000000..6a7c7cbadb --- /dev/null +++ b/recognition/45316207_VQ-VAE/utils.py @@ -0,0 +1,291 @@ +""" +utils.py + +Alex Nicholson (45316207) +11/10/2022 + +Contains extra utility functions to help with things like plotting visualisations and ssim calculation + +""" + + +from re import A +import matplotlib.pyplot as plt +import numpy as np +from skimage.metrics import structural_similarity as ssim +import tensorflow as tf +import tensorflow_probability as tfp +from tensorflow import keras + + +def show_reconstruction_examples(model, test_data, num_to_show): + """ + Shows a series of tiled plots with side-by-side examples of the original data and reconstructed data + + Parameters: + model (Keras Model): VQ VAE Model + test_data (ndarray): Test dataset of real brain MRI images + num_to_show (int): Number of reconstruction comparison examples to show + """ + + # Visualise output generations from the finished model + idx = np.random.choice(len(test_data), num_to_show) + + test_images = test_data[idx] + reconstructions_test = model.predict(test_images) + + for i in range(reconstructions_test.shape[0]): + original = test_images[i, :, :, 0] + reconstructed = reconstructions_test[i, :, :, 0] + + plt.figure() + + plt.subplot(1, 2, 1) + plt.imshow(original.squeeze() + 0.5, cmap='gray') + plt.title("Original") + plt.axis("off") + + plt.subplot(1, 2, 2) + plt.imshow(reconstructed.squeeze() + 0.5, cmap='gray') + plt.title("Reconstructed (ssim: {:.2f})".format(get_image_ssim(original, reconstructed))) + plt.axis("off") + + plt.savefig('out/original_vs_reconstructed_{:04d}.png'.format(i)) + plt.close() + +def l2_dist_scratch(x, y): + # Can do at least 2500 images + # Pretty sure works for 5000 and maybe 7000 + sx = np.sum(x**2, axis=1, keepdims=True) + sy = np.sum(y**2, axis=1, keepdims=True) + distances = np.sqrt(-2 * x.dot(y.T) + sx + sy.T) + return distances + + +def get_code_indices_savedmodel(vector_quantizer, flattened_inputs): + """ + Gets the indices of the codebook vectors??? + + Parameters: + (Tensorflow Tensor): purpose??? + + Returns: + encoding_indices (Tensorflow Tensor): purpose??? + """ + + # Calculate L2-normalized distance between the inputs and the codes. + distances = l2_dist_scratch(flattened_inputs, vector_quantizer.embeddings.numpy().T) + + # Derive the indices for minimum distances. + encoding_indices = tf.argmin(distances, axis=1) + + return encoding_indices.numpy() + + + + + + +def plot_training_metrics(history): + """ + Shows a series of tiled plots with side-by-side examples of the original data and reconstructed data + + Parameters: + history (???): The training history (list of metrics over time) for the model + """ + num_epochs = len(history.history["loss"]) + + # Plot losses + plt.figure() + plt.plot(range(1, num_epochs+1), history.history["loss"], label='Total Loss', marker='o') + plt.plot(range(1, num_epochs+1), history.history["reconstruction_loss"], label='Reconstruction Loss', marker='o') + plt.plot(range(1, num_epochs+1), history.history["vqvae_loss"], label='VQ VAE Loss', marker='o') + plt.title('Training Losses', fontsize=14) + plt.xlabel('Training Epoch', fontsize=14) + plt.xticks(range(1, num_epochs+1)) + plt.ylabel('Loss', fontsize=14) + plt.legend() + plt.grid(True) + plt.savefig('out/training_loss_curves.png') + plt.close() + + # Plot log losses + plt.figure() + plt.plot(range(1, num_epochs+1), history.history["loss"], label='Log Total Loss', marker='o') + plt.plot(range(1, num_epochs+1), history.history["reconstruction_loss"], label='Log Reconstruction Loss', marker='o') + plt.plot(range(1, num_epochs+1), history.history["vqvae_loss"], label='Log VQ VAE Loss', marker='o') + plt.title('Training Log Losses', fontsize=14) + plt.xlabel('Training Epoch', fontsize=14) + plt.xticks(range(1, num_epochs+1)) + plt.ylabel('Log Loss', fontsize=14) + plt.yscale('log') + plt.legend() + plt.grid(True) + plt.savefig('out/training_logloss_curves.png') + plt.close() + +def plot_ssim_history(ssim_history): + """ + Shows a series of tiled plots with side-by-side examples of the original data and reconstructed data + + Parameters: + history (???): The training history (list of metrics over time) for the model + """ + num_epochs = len(ssim_history) + + # SSIM History + plt.figure() + plt.plot(range(1, num_epochs+1), ssim_history, label='Average Model SSIM', marker='o') + plt.title('Model SSIM Performance Over Time', fontsize=14) + plt.xlabel('Training Epoch', fontsize=14) + plt.xticks(range(1, num_epochs+1)) + plt.ylabel('Average Model SSIM', fontsize=14) + plt.legend() + plt.grid(True) + plt.savefig('out/training_ssim_curve.png') + plt.close() + + +def get_image_ssim(image1, image2): + """ + Gets the ssim between 2 images + + Parameters: + image1 (ndarray): An image + image2 (ndarray): A second image to compare with the first one + + Returns: + ssim (int): The structural similarity index between the two given images + """ + similarity = ssim(image1, image2, + data_range=image1.max() - image1.min()) + + return similarity + + +def get_model_ssim(model, test_data): + """ + Gets the average ssim of a model + + Parameters: + model (ndarray): The VQ VAE model + test_data (ndarray): Test dataset of real brain MRI images + + Returns: + ssim (int): The average structural similarity index achieved by the model + """ + + sample_size = 10 # The number of generations to average over + + similarity_scores = [] + + # Visualise output generations from the finished model + idx = np.random.choice(len(test_data), 10) + + test_images = test_data[idx] + reconstructions_test = model.predict(test_images) + + for i in range(reconstructions_test.shape[0]): + original = test_images[i, :, :, 0] + reconstructed = reconstructions_test[i, :, :, 0] + + similarity_scores.append(ssim(original, reconstructed, data_range=original.max() - original.min())) + + average_similarity = np.average(similarity_scores) + + return average_similarity + + + + + + + + + +# TODO: Document this function +def visualise_codebook_sampling(vqvae_model, pixelcnn_model, train_data, num_embeddings, examples_to_show): + # Create a mini sampler model. + inputs = tf.keras.layers.Input(shape=pixelcnn_model.input_shape[1:]) + outputs = pixelcnn_model(inputs, training=False) + categorical_layer = tfp.layers.DistributionLambda(tfp.distributions.Categorical) + outputs = categorical_layer(outputs) + sampler = keras.Model(inputs, outputs) + + + # Construct a prior to generate images + # Create an empty array of priors + priors = np.zeros(shape=(examples_to_show,) + (pixelcnn_model.input_shape)[1:]) + examples_to_show, rows, cols = priors.shape + + # Iterate over the priors because generation has to be done sequentially pixel by pixel + print(f"Generating priors... (this step may take a long time)") + for row in range(rows): # 64 + print(f"Row {row}/{rows}") + for col in range(cols): # 64 + # Feed the whole array and retrieving the pixel value probabilities for the next pixel + probs = sampler.predict(priors, verbose=0) + # Use the probabilities to pick pixel values and append the values to the priors + priors[:, row, col] = probs[:, row, col] + + print(f"Prior shape: {priors.shape}") + + # Now use the decoder to generate the images + # Perform an embedding lookup. + pretrained_embeddings = vqvae_model.get_layer("vector_quantizer").embeddings + priors_ohe = tf.one_hot(priors.astype("int32"), num_embeddings).numpy() + quantized = tf.matmul( + priors_ohe.astype("float32"), pretrained_embeddings, transpose_b=True + ) + encoder_output_shape = vqvae_model.get_layer("encoder").predict(train_data).shape[1:] + quantized = tf.reshape(quantized, (-1, *(encoder_output_shape))) + + # Generate novel images. + decoder = vqvae_model.get_layer("decoder") + generated_samples = decoder.predict(quantized) + + for i in range(examples_to_show): + plt.figure() + plt.subplot(1, 2, 1) + plt.imshow(priors[i]) + plt.title("Code") + plt.axis("off") + + plt.subplot(1, 2, 2) + plt.imshow(generated_samples[i].squeeze() + 0.5) + plt.title("Generated Sample") + plt.axis("off") + plt.savefig('out/novel_generation_{:04d}.png'.format(i)) + plt.close + + +# TODO: Document this function +def visualise_codes(model, test_data, num_to_show): + print("#########################") + encoder = model.get_layer("encoder") + quantizer = model.get_layer("vector_quantizer") + print(quantizer) + print(type(quantizer)) + print("#########################") + + idx = np.random.choice(len(test_data), num_to_show) + test_images = test_data[idx] + + encoded_outputs = encoder.predict(test_images) + flat_enc_outputs = encoded_outputs.reshape(-1, encoded_outputs.shape[-1]) + codebook_indices = get_code_indices_savedmodel(quantizer, flat_enc_outputs) + codebook_indices = codebook_indices.reshape(encoded_outputs.shape[:-1]) + + for i in range(len(test_images)): + plt.figure() + plt.subplot(1, 2, 1) + plt.imshow(test_images[i].squeeze() + 0.5) + plt.title("Original") + plt.axis("off") + + plt.subplot(1, 2, 2) + plt.imshow(codebook_indices[i]) + plt.title("Code") + plt.axis("off") + plt.savefig('out/code_{:04d}.png'.format(i)) + plt.close \ No newline at end of file diff --git a/recognition/ISICs_UNet/README.md b/recognition/ISICs_UNet/README.md deleted file mode 100644 index 788ea17b79..0000000000 --- a/recognition/ISICs_UNet/README.md +++ /dev/null @@ -1,101 +0,0 @@ -# Segment the ISICs data set with the U-net - -## Project Overview -This project aim to solve the segmentation of skin lesian (ISIC2018 data set) using the U-net, with all labels having a minimum Dice similarity coefficient of 0.7 on the test set[Task 3]. - -## ISIC2018 -![ISIC example](imgs/example.jpg) - -Skin Lesion Analysis towards Melanoma Detection - -Task found in https://challenge2018.isic-archive.com/ - - -## U-net -![UNet](imgs/uent.png) - -U-net is one of the popular image segmentation architectures used mostly in biomedical purposes. The name UNet is because it’s architecture contains a compressive path and an expansive path which can be viewed as a U shape. This architecture is built in such a way that it could generate better results even for a less number of training data sets. - -## Data Set Structure - -data set folder need to be stored in same directory with structure same as below -```bash -ISIC2018 - |_ ISIC2018_Task1-2_Training_Input_x2 - |_ ISIC_0000000 - |_ ISIC_0000001 - |_ ... - |_ ISIC2018_Task1_Training_GroundTruth_x2 - |_ ISIC_0000000_segmentation - |_ ISIC_0000001_segmentation - |_ ... -``` - -## Dice Coefficient - -The Sørensen–Dice coefficient is a statistic used to gauge the similarity of two samples. - -Further information in https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient - -## Dependencies - -- python 3 -- tensorflow 2.1.0 -- pandas 1.1.4 -- numpy 1.19.2 -- matplotlib 3.3.2 -- scikit-learn 0.23.2 -- pillow 8.0.1 - - -## Usages - -- Run `train.py` for training the UNet on ISIC data. -- Run `evaluation.py` for evaluation and case present. - -## Advance - -- Modify `setting.py` for custom setting, such as different batch size. -- Modify `unet.py` for custom UNet, such as different kernel size. - -## Algorithm - -- data set: - - The data set we used is the training set of ISIC 2018 challenge data which has segmentation labels. - - Training: Validation: Test = 1660: 415: 519 = 0.64: 0.16 : 0.2 (Training: Test = 4: 1 and in Training, further split 4: 1 for Training: Validation) - - Training data augmentations: rescale, rotate, shift, zoom, grayscale -- model: - - Original UNet with padding which can keep the shape of input and output same. - - The first convolutional layers has 16 output channels. - - The activation function of all convolutional layers is ELU. - - Without batch normalization layers. - - The inputs is (384, 512, 1) - - The output is (384, 512, 1) after sigmoid activation. - - Optimizer: Adam, lr = 1e-4 - - Loss: dice coefficient loss - - Metrics: accuracy & dice coefficient - -## Results - -Evaluation dice coefficient is 0.805256724357605. - -plot of train/valid Dice coefficient: - -![img](imgs/train_and_valid_dice_coef.png) - -case present: - -![case](imgs/case%20present.png) - -## Reference -Manna, S. (2020). K-Fold Cross Validation for Deep Learning using Keras. [online] Medium. Available at: https://medium.com/the-owl/k-fold-cross-validation-in-keras-3ec4a3a00538 [Accessed 24 Nov. 2020]. - -zhixuhao (2020). zhixuhao/unet. [online] GitHub. Available at: https://github.com/zhixuhao/unet. - -GitHub. (n.d.). NifTK/NiftyNet. [online] Available at: https://github.com/NifTK/NiftyNet/blob/a383ba342e3e38a7ad7eed7538bfb34960f80c8d/niftynet/layer/loss_segmentation.py [Accessed 24 Nov. 2020]. - -Team, K. (n.d.). Keras documentation: Losses. [online] keras.io. Available at: https://keras.io/api/losses/#creating-custom-losses [Accessed 24 Nov. 2020]. - -262588213843476 (n.d.). unet.py. [online] Gist. Available at: https://gist.github.com/abhinavsagar/fe0c900133cafe93194c069fe655ef6e [Accessed 24 Nov. 2020]. - -Stack Overflow. (n.d.). python - Disable Tensorflow debugging information. [online] Available at: https://stackoverflow.com/questions/35911252/disable-tensorflow-debugging-information [Accessed 24 Nov. 2020]. diff --git a/recognition/ISICs_Unet/README.md b/recognition/ISICs_Unet/README.md deleted file mode 100644 index f2c009212e..0000000000 --- a/recognition/ISICs_Unet/README.md +++ /dev/null @@ -1,52 +0,0 @@ -# Segmenting ISICs with U-Net - -COMP3710 Report recognition problem 3 (Segmenting ISICs data set with U-Net) solved in TensorFlow - -Created by Christopher Bailey (45576430) - -## The problem and algorithm -The problem solved by this program is binary segmentation of the ISICs skin lesion data set. Segmentation is a way to label pixels in an image according to some grouping, in this case lesion or non-lesion. This translates images of skin to masks representing areas of concern for skin lesions. - -U-Net is a form of autoencoder where the downsampling path is expected to learn the features of the image and the upsampling path learns how to recreate the masks. Long skip connections between downpooling and upsampling layers are utilised to overcome the bottleneck in traditional autoencoders allowing feature representations to be recreated. - -## How it works -A four layer padded U-Net is used, preserving skin features and mask resolution. The implementation utilises Adam as the optimizer and implements Dice distance as the loss function as this appeared to give quicker convergence than other methods (eg. binary cross-entropy). - -The utilised metric is a Dice coefficient implementation. My initial implementation appeared faulty and was replaced with a 3rd party implementation which appears correct. 3 epochs was observed to be generally sufficient to observe Dice coefficients of 0.8+ on test datasets but occasional non-convergence was observed and could be curbed by increasing the number of epochs. Visualisation of predictions is also implemented and shows reasonable correspondence. Orange bandaids represent an interesting challenge for the implementation as presented. - -### Training, validation and testing split -Training, validation and testing uses a respective 60:20:20 split, a commonly assumed starting point suggested by course staff. U-Net in particular was developed to work "with very few training images" (Ronneberger et al, 2015) The input data for this problem consists of 2594 images and masks. This split appears to provide satisfactory results. - -## Using the model -### Dependencies required -* Python3 (tested with 3.8) -* TensorFlow 2.x (tested with 2.3) -* glob (used to load filenames) -* matplotlib (used for visualisations, tested with 3.3) - -### Parameter tuning -The model was developed on a GTX 1660 TI (6GB VRAM) and certain values (notably batch size and image resolution) were set lower than might otherwise be ideal on more capable hardware. This is commented in the relevant code. - -### Running the model -The model is executed via the main.py script. - -### Example output -Given a batch size of 1 and 3 epochs the following output was observed on a single run: -Era | Loss | Dice coefficient ---- | ---- | ---------------- -Epoch 1 | 0.7433 | 0.2567 -Epoch 2 | 0.3197 | 0.6803 -Epoch 3 | 0.2657 | 0.7343 -Testing | 0.1820 | 0.8180 - - -### Figure 1 - example visualisation plot -Skin images in left column, true mask middle, predicted mask right column -![Visualisation of predictions](visual.png) - -## References -Segments of code in this assignment were used from or based on the following sources: -1. COMP3710-demo-code.ipynb from Guest Lecture -1. https://www.tensorflow.org/tutorials/load_data/images -1. https://www.tensorflow.org/guide/gpu -1. Karan Jakhar (2019) https://medium.com/@karan_jakhar/100-days-of-code-day-7-84e4918cb72c diff --git a/recognition/XUE4645768/README.md b/recognition/XUE4645768/README.md deleted file mode 100644 index 36250adaa3..0000000000 --- a/recognition/XUE4645768/README.md +++ /dev/null @@ -1,59 +0,0 @@ -# Graph Convolutional Networks -*COMP3710 Report - -*Student Name: Xue Zhang - -*Student ID: 46457684 - -*TensorFlow implementation of Graph Convolutional Networks based on Facebook Large Page-Page Network dataset for semi-supervised multi-class node classification. - - -# Requirement - -*Python version 3.6 - -*Tensorflow 2.5 - -*Pytorch installation - -*Sklearn, pandas, numpy,scipy and matplotlib libraries - - - - - - -# Data -Facebook Large Page-Page Network -https://snap.stanford.edu/data/facebook-large-page-page-network.htm - -Processed dataset where the features are in the form of 128 dim vectors . - -Data Structure: - -Shape of Edge data: (342004, 2) - -Shape of Feature data: (22470, 128) - -Shape of Target data (22470,) - -Number of features of each node: 128 - -Categories of labels: {0, 1, 2, 3} - -Data split: -Training set : Validation set : Test set = 0.2 : 0.2 :0.6 - -# Running - -In the gcn.py:Data preprocessing, accuracy of model training &model test, TSNE embeddings plot with ground truth in colors. -A main function is included in the code - -python gcn.py - -Warning: Please pay attention to whether the data path is correct when you run the gcn.py. - - -```python - -``` diff --git a/recognition/XUE4645768/Readme.md b/recognition/XUE4645768/Readme.md deleted file mode 100644 index 94bc1848c0..0000000000 --- a/recognition/XUE4645768/Readme.md +++ /dev/null @@ -1,105 +0,0 @@ -# Graph Convolutional Networks -*COMP3710 Report - -*Student Name: Xue Zhang - -*Student ID: 46457684 - -*TensorFlow implementation of Graph Convolutional Networks based on Facebook Large Page-Page Network dataset for semi-supervised multi-class node classification. - - -# Requirement - -*Python version 3.6 - -*Tensorflow 2.5 - -*Pytorch installation - -*Sklearn, pandas, numpy,scipy and matplotlib libraries - - - - - - -# Data -Facebook Large Page-Page Network -https://snap.stanford.edu/data/facebook-large-page-page-network.htm - -Processed dataset where the features are in the form of 128 dim vectors . - -Data Structure: - -Shape of Edge data: (342004, 2) - -Shape of Feature data: (22470, 128) - -Shape of Target data (22470,) - -Number of features of each node: 128 - -Categories of labels: {0, 1, 2, 3} - -Data split: -Training set : Validation set : Test set = 0.2 : 0.2 :0.6 - -# Running - -In the gcn.py:Data preprocessing, accuracy of model training &model test, TSNE embeddings plot with ground truth in colors. -A main function is included in the code - -python gcn.py - -Warning: Please pay attention to whether the data path is correct when you run the gcn.py. - -# Training - -Learning rate= 0.01 -Weight dacay =0.005 - -For 200 epoches: -```Epoch 000: Loss 0.2894, TrainAcc 0.9126, ValAcc 0.8954 -Epoch 001: Loss 0.2880, TrainAcc 0.9126, ValAcc 0.895 -Epoch 002: Loss 0.2866, TrainAcc 0.9126, ValAcc 0.8961 -Epoch 003: Loss 0.2853, TrainAcc 0.9132, ValAcc 0.8961 -Epoch 004: Loss 0.2839, TrainAcc 0.9137, ValAcc 0.8961 -Epoch 005: Loss 0.2826, TrainAcc 0.9141, ValAcc 0.8963 -Epoch 006: Loss 0.2813, TrainAcc 0.9146, ValAcc 0.8956 -Epoch 007: Loss 0.2800, TrainAcc 0.9146, ValAcc 0.8956 -Epoch 008: Loss 0.2788, TrainAcc 0.9146, ValAcc 0.8959 -Epoch 009: Loss 0.2775, TrainAcc 0.9146, ValAcc 0.8970 -Epoch 010: Loss 0.2763, TrainAcc 0.915, ValAcc 0.8974 -Epoch 011: Loss 0.2751, TrainAcc 0.915, ValAcc 0.8972 -Epoch 012: Loss 0.2739, TrainAcc 0.915, ValAcc 0.8976 -Epoch 013: Loss 0.2727, TrainAcc 0.9157, ValAcc 0.8979 -Epoch 014: Loss 0.2716, TrainAcc 0.9157, ValAcc 0.8983 -Epoch 015: Loss 0.2704, TrainAcc 0.9161, ValAcc 0.8990 -Epoch 016: Loss 0.2693, TrainAcc 0.9168, ValAcc 0.8988 -Epoch 017: Loss 0.2682, TrainAcc 0.9181, ValAcc 0.8990 -Epoch 018: Loss 0.2671, TrainAcc 0.9179, ValAcc 0.8990 -Epoch 019: Loss 0.2660, TrainAcc 0.9179, ValAcc 0.8992 -Epoch 020: Loss 0.2650, TrainAcc 0.9188, ValAcc 0.8996 -...... -Epoch 190: Loss 0.1623, TrainAcc 0.9553, ValAcc 0.9134 -Epoch 191: Loss 0.1619, TrainAcc 0.9555, ValAcc 0.9134 -Epoch 192: Loss 0.1615, TrainAcc 0.9555, ValAcc 0.9132 -Epoch 193: Loss 0.1611, TrainAcc 0.9557, ValAcc 0.9130 -Epoch 194: Loss 0.1607, TrainAcc 0.9562, ValAcc 0.9130 -Epoch 195: Loss 0.1603, TrainAcc 0.9559, ValAcc 0.9130 -Epoch 196: Loss 0.1599, TrainAcc 0.9562, ValAcc 0.9126 -Epoch 197: Loss 0.1595, TrainAcc 0.9562, ValAcc 0.9123 -Epoch 198: Loss 0.1591, TrainAcc 0.9562, ValAcc 0.9123 -Epoch 199: Loss 0.1587, TrainAcc 0.9562, ValAcc 0.9123``` - -For test accuracy:around 0.9 - -# TSNE -For the test:iteration=500, with lower dimension to 2 - - - - -```python - -```