Experimentation

Introduction

The experimentation section explains the different tests we performed. The training parameters of our best model can be found in the training and validation section of this report.

Image processing: In order to improve our results, we tried different combinations of model architecture and input image processing. The first type of image processing used was the grayscale mel-spectogram. Since these images seemed to be rather noisy, we tried a second type of image processing which consisted of binarizing the image into black and white. To achieve this, we used the OpenCV library and applied a mask to each of the images. Pixels with an intensity higher than a certain threshold became white and pixels with a lower intensity became black. However, it was difficult to identify the optimal threshold to retain maximum information since the differences in intensity between the noise and the signals of interest were not always very marked. In addition, the results obtained on the black and white images did not represent an improvement over the original images in any of the cases tested, so we spent a major part of our experiments using the original images.

Architecture: The first model we tested was mainly to ensure that it was possible to learn our training data. We therefore used a ResNet 18 without pre-training for 10 epochs with a batch size of 128 and a learning rate of 0.1. We used the cross-entropy loss function since this is a classification task. In addition, the stochastic gradient descent optimization algorithm was used. In this first test, we were able to confirm that our network could learn. However, 10 epochs was too few since metrics such as validation loss and validation accuracy had not yet reached a plateau. We therefore increased the number of epochs to 30 and increased the batch size to 256 in order to reduce the training time. In addition, since we have a very large number of classes and a class imbalance is present, a larger batch size allows us to have a more representative sample each time the network weights are modified. The accuracy obtained in testing for this first model is 48.06%.

Figure 1: In A, the sound signal transformed into a gray-tone mel-spectrogram and in B, the gray-tone images converted to black and white.

We then tested, still with Resnet 18, the use of pre-training. We performed a first test by freezing the first two layers and then freezing only the first. The results obtained in testing were significantly lower than those of the last model, with accuracy values in testing of 18.97% and 34.58% respectively. We believe that this drop in performance is due to the fact that the ResNet was trained on the ImageNet database and that the images are very different from ours. The attributes learned on ImageNet by the first layers of the network do not seem to be appropriate for the type of image we are using. We therefore limit our future training to the uninitialized network.

We then tested the ResNet 34 architecture by decreasing the learning rate to 0.01 and using the Adam optimization algorithm. This training allows us to obtain our best test results so far with an accuracy of 49.33%. However, we notice that the validation loss had started to increase in the last 15 epochs. This suggests that the network is starting to overlearn the training data.

Training

To address the overfitting problem we used three different techniques, the combination of these three techniques allowed us to obtain our best results in testing. We first tested our model by adding an adaptive learning rate. We used the Pytorch library function ReduceLROnPlateau. This function takes as input a patience argument that indicates after how many epochs to decrease the learning rate when the validation loss reaches a plateau. We also implemented early stopping. This technique also depends on a patience argument, which indicates after how many epochs to stop training when the validation loss no longer decreases. The weights of the network with the smallest loss during training are saved and replace the weights of the current network at the end of training. The last test we performed consisted of adding dropout to some layers, again with the aim of reducing overfitting of the network.

Validation

The best performing model is ResNet 34 with a batch size of 256 and an initial learning rate of 0.001. The optimizer used is Adam. Adaptive learning rate, early stopping and dropout were all used on this network. Training ended after only 13 epochs. The test accuracy obtained with this network is 52.48%.

Figure 2: Training Resnet 34 with the training parameters specified above.

We can see in this figure the benefits of early stopping which stops training at epoch 13. The weights used for the test are those of the ninth epoch.

Wiki

Home
Accueil

Dossiers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimentation

Introduction

Training

Validation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally