Xiaoyu SONG My personal Blog | Xiaoyu SONG

<< All Blogs

Learning Deconv_2015

Summary

  • VGG16 + Deconvolutional Network (The mirrored VGG16)
  • Dense pixel-wise class probability map is obtained.
  • unpooling capture example-specific structures
    • reconstruct the detailed structure of an object in finer resolution
  • Learned filters capture class-specific shapes
    • deconvolutions, the activations closely related to the target classes are amplified while noisy activations from other regions are suppressed.

Training

  • Batch normalization
  • 2-stage traning :
    • easy examples (crop object instances using ground-truth annotations, so that an object is centered at the cropped bounding box)
    • fine-tune with more challenging examples (object proposals: Candidate proposals sufficiently overlapped with ground-truth segmentations are selected for training)
      • Generate sufficient number of candidate proposals
      • Apply trained network to obtain semantic segmentation maps of individual proposals
      • Aggregate the output of all proposals

Inference

  • Employ edge-box to generate object proposals.