Use computer vision techniques to identify and remove background from a video stream in real time.
We thoroughly researched the works in this field and came up with many different solutions that can solve the given challenge. Following are the approaches we tried:
- YOLACT: Real time instance segmentation link
- SIAM-MASK: Fast Online Object Tracking and Segmentation link
- BodyPix: Person Segmentation in the Browser link
- Mask-RCNN: Object Detection and Segmentation link
- DeepLabs: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs link
- Google AI: Mobile Real-time Video Segmentation link
- GrabCut Algorithm: Interactive Foreground Extraction link
- Fully-convolutional model for realtime instance segmentation that achieves 29.8 mAP on MS COCO at 33 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach.
- Accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coeffi-cients.
- Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn’t depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free.
- Also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty
- The processing speed is very fast for real time data extraction. On Titan Xp GPU, the results are: Image Size: 550 Backbone model: Resnet50-FPN FPS: 42.5
- perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.
- SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task.
- SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.
- Speed are tested on a NVIDIA RTX 2080. FPS: 56
This approach was requires GPU machine and the output was very slow fps.
This approach had decent accuracy but the processing time was very high, approx 14 seconds per frame, so it was not feasible for real time.
DeepLabs: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
This approach doesn't give high accuracy and the processing was also very slow.
The accuracy was not good in this approach.
This approach required the bounding box to be detected for each frame and then process for foreground extraction, hence this was having low fps.