Week #14
28 Jan 2019I. Unet overfitting
View codes on GitHub
Changing the image size to 388 x 388, and decrease parameters of the network (smaller channel numbers). The UNet is overfitted after 500 epochs, the final loss is around 0.018. So the image size was too large to overfit the network. Use small sized image will soon overfit the network.
II. Stage defination
I have discussed with Sebastien and we conclude that there will be 2 stages in this project:
- Stage 1 Start with the simple approach which takes only T1w as input and scale5 as output
- Stage 2 Start with a more complex approach which consider all scales. This will make a significant contribution as QuickNAT is designed for mono scale parcellation.
III. 3D MRI slice
View codes on GitHub
To slice the 3D data to 3 planes, each plane will comprise slices, the number of slices could be set in the code.
In this case, when slice the 3D T1w image and the parcellation-5 image of the same interval, we could get the label and the corresponding image of the same voxel coordinate.
After training, to reconstruct the whole brain segmentation, according to QuickNAT paper, they are using:
where the \(p_1(x)\), \(p_2(x)\) and \(p_3(x)\) represent the predicted probability vector for axial, coronal and sagittal views respectively. And we will use the same strategy.
There are in total 330 MRI T1w image and 330 corresponding parcellation image. If each 3D image is sliced per 1 pixel along 3 axis, it’s 581 image slices per T1w image and 581 slices per parcellation scale-5 3D image.
IV. Training with UNet
- Feed to unet one subject’s axial plane, in total 182 images, batch size is 5, image size is 128 x 128, with 1,948,096 parameters, the loss could decrease to 0.55, with 31,058,944 parameters, the loss could decrease to 0.13. View codes on GitHub
- Every subject feed 10 epochs, in total 329 subjects, so in total 3290 epochs, batch size is 5, with 31,058,944 parameters, for each subject, the minimum loss achieve around 0.5. So I think this is due to the number of epochs are a little bit small for each subject, consider to use 50 epochs for each subject instead. View codes on GitHub
V. QuickNAT model
View codes on GitHub
QuickNAT model is created using Pytorch according to the article. Firstly, I would like to train QuickNAT using the dataset BMCC (2d images, in total 43 images), but since the GPU memory is not enough(I’m training Unet with MRIimage), it will be trained next week.
VI. Paper review
Noh et al. “Learning Deconvolution Network for Semantic Segmentation.” In 2015 IEEE International Conference on Computer Vision (ICCV). link
Squeeze-and-Excitation Networks (SENets) introduce a building block for CNNs that improves channel interdependencies at almost no computational cost
Note
Pooling
Pooling in convolutional network is designed to filter noisy activations, it helps by retaining only robust activations in upper layers, but the spatial information is lost during pooling.
The convolution network corresponds to feature extractor that transforms the input image to multidimensional feature representation, the deconvolution network is a shape generator that produces object segmentation from feature extracted from the convolution network.
Unpool
During the pooling operation, create a matrix which record the location of the maximum value, unpool operation will insert the pooled value in the original place, with the remaining elements being set to zero. Unpooling captures example-specific structures by tracing the original locations with strong activations back to image space. As a result, it effectively reconstructs the detailed structure.
Deconvolution layer
The output of an unpooling layer is an enlarged, sparse activation map. The deconvolution layers densify the sparse activations. The output of the deconvolutional layer is an enlarged and dense activation map.
Through deconvolutions, the activation closely related to the target classes are amplified while noisy activations from other regions are suppressed effectively.
One important point of convolution operation is that the positional connectivity exists between the input values and the output values.
Batch normalization layer is added to the output of every convolutional and deconvolutional layer.
2-stage training
Training first with the cropped instances where an object in centered at the cropped bounding box and then utilize candidate proposals sufficiently overlapped with ground-truth segmentation are selected for training.