Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry towards Monocular Deep SLAM


In this paper we tackle the joint learning problem of keyframe detection and visual odometry towards monocular visual SLAM systems. As an important task in visual SLAM, keyframe selection helps efficient camera relocalization and effective augmentation of visual odometry. To benefit from it, we first present a deep network design for the keyframe selection, which is able to reliably detect keyframes and localize new frames, then an end-to-end unsupervised deep framework further proposed for simultaneously learning the keyframe selection and the visual odometry tasks. As far as we know, it is the first work to jointly optimize these two complementary tasks in a single deep framework. To make the two tasks facilitate each other in the learning, a collaborative optimization loss based on both geometric and visual metrics is proposed. Extensive experiments on publicly available datasets (ie KITTI raw dataset and its odometry split) clearly demonstrate the effectiveness of the proposed approach, and new state-of-the-art results are established on the unsupervised depth and pose estimation from monocular videos.

IEEE/CVF International Conference on Computer Vision