Week #13
21 Jan 2019I. Self-taught through following resources:
semantic segmentation review
Upsample vs Transpose Convolution
Why Adam optimization
II. More details on Unet
I am now trying to get the final segmentation map since the output of the network is the probability map. So to get the final segmentation, I should first find the maximum value in the first dimension, and represent each value to a specific color. For example, if I have 3 classes, I will have a 3 x 1024 x 1024 output of my network. And the goal is to get every class label for every pixel, if the first pixel value is (0,1,5), then it’s clear it belongs to the class 2 since 5 is the maximum value. And then do a map :
0 : torch.Tensor([255, 255, 255]),
1 : torch.Tensor([255, 0, 0]),
2 : torch.Tensor([0, 255, 0])
I will get the segmentation map. This work is still under development.
III. Paper review
How convolutional neural networks see the world - A survey of convolutional neural network visualization methods paper link
Notes
Upsample-Goal: From low resolution to high resolution 2 Main methods for upsample:
- Interpolation methods, including
- Nearest neighbor interpolation
- Bi-linear interpolation
- Bi-cubic interpolation All these methods don’t have the parameters to be learnt by the network.
- Transposed convolution: It has learnable parameters
For the semantic segmentation, we should use the Transposed convolution since it uses convolutional layers to extract features in the encoder and then restores the original image size in the decoder so that it can classify every pixel in the original image.
Transposed convolution
One important point of convolution operation is that the positional connectivity exists between the input values and the output values. For example, the 3x3 kernel is used to connect the 9 values in the input matrix to 1 value in the output matrixA convolution operation forms a many-to-one relationship.
Transposed convolution is one-to-many operation.
The transposed convolution operation forms the same connectivity as the normal convolution but in the backward direction. And the weights in the transposed convolution are learnable.
Feature extraction
Convolutional layer do the feature extraction.
Convolutional layer neurons (convlution filters) convolve with the input image or the outputs of the previous layer, the results are considered as learned features and recorded in the feature maps. With deeper layer, the neurons are expected to extract higher level features.
The filters perform the convolution process to transform the input images or previous layer feature maps into the output feature maps.
The filters act as feature extractor from the original input image to learn the useful features.
Pool layers contribute the CNN with fewer data redundancy and therefore less data processing workload.
Fully-connected layers perform a comprehensive feature evaluation based on the features extracted by the conv layers. And generate an N-dimensional probability vector.
Softmax function is used to generate the final classification probability.
Through the hierarchical structured layers, CNNs transform the input image layer by layer from the original pixel values to the final probability vector P, in which the largest $P_i$ indicated the most predicted class.
With the learning process, the convolutional filters become well configured to extract certain features. The features captured by convolutional filters can be demonstrated by visualization.
Dilated Convolutions(Atrous Convolution) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs dilation rate: Defines a spacing between the values in a kernel. A 3x3 kernel with a dilation rate of 2 will have the same field of view as a 5x5 kernel, while only using 9 parameters
IV. Next step
After the Skype meeting with Sebastien, I think the next week I will try to implement the QuickNAT architecture and try to use the MRI dataset with the UNet also.