Wang Kaixuan (王凯旋)
Ph.D. on 3D Vision and Robotics

I am a Ph.D. of UAV Group, HKUST, supervised by Prof. Shaojie Shen. My research includes depth estimation using monocular cameras, deep learning, and 3D reconstruction. During the past years, I have been developing methods to build a pipeline from images to dense 3D maps for robotic navigation, especially for UAVs. Most of the past work is open source on GitHub for the benefit of the community. I worked in the AILab, ByteDance as an intern and now in DJI perception team as an algorithm engineer.

You are welcomed to contact me by email if you are interested in working at DJI perception team as an intern or fulltime engineer!

In this page, you can find a summary of my past projects and published papers.

You may also be interested to my wife, Ye Xue, who is now a researcher dedicated to machine learning theory and wireless communication.

Link: GitHub, Google Scholar, Personal, Group

wkx1993 AT gmail DOT com

halfbullet DOT wang AT dji DOT com (for work/intern only)


Project Highlights

Monocular Dense Mapping

Monocular Dense Mapping aims to estimated dense depth maps using images from only one monocular camera. Compared with the widely used stereo perception, the one camera solution has the advantags of sensor size, weight and no need for extrinsic calibration. Using the visual inertial odometry method (e.g. VINS), UAV autonomous navigation can be demonstrated using only one camera and one IMU (see figure for an example).

Two methods are developed to calculate the depth map using belief propagation. One method, Pinhole-Fisheye-Mapping, focuses on the varying baseline utilization for robust depth estimation. Another method, Quadtree-Mapping, uses the quadtree structure of the image to speed up the depth global optimization. Benefiting from the development of deep learning methods, we also proposed MVDepthNet that can estimate depth maps using one monocular camera in real time. All the above methods have been applied on real UAV systems and demonstrated their usability.

Surfel-based 3D Reconstruction

3D reconstruction needs more than depth images with estimated camera poses. Since there exist noises in depth maps and estimated camera poses will drift in the long term, a proper 3d fusion method is needed to build a dense, globally consistent model for visualization and path planning. SurfelMapping is a reconstruction method that can work with state-of-the-art SLAM methods (e.g., VINS, ORB-SLAM). The map is represented as a collection of surfels such that it can be deformed efficiently according to the pose graph optimization of SLAM systems. Unlike ElasticFusion which model surfel for each pixel, we estimate surfels for extracted superpixels to gain efficiency. SurfelMapping is capable of reconstructing the KITTI dataset in real time without GPU acceleration.

Deep Learning for 3D Vision

Deep learning has been developed a lot in recent years. Some of my work tries to solve 3D vision problems by using learning methods. MVDepthNet is one of the first methods that use networks to solve multiview stereo problems but is the only one that is designed for real-time performance. MVDepthNet is further used by Magic Leap in DeepPerimeter for indoor reconstruction. Flow-Motion-Depth solves the motion stereo problem by a carefully designed network that can estimate depth maps and camera motions given two images. I also take a deep study into monocular depth prediction. Since the quantity of available datasets for depth learning is quite limited, we propose GeometricPretraining that can pretrain networks with unlimited internet videos. With the pretrained networks, we achieve new state-of-the-art performance. The performance is demonstrated in the following:


Autonomous aerial robot using dual-fisheye cameras (W. Gao, K. Wang, W. Ding, F. Gao, T. Qin and S. Shen), In Journal of Field Robotics, 2020.
To appear

Geometric Pretraining for Monocular Depth Estimation (K. Wang and Y. Chen and H. Guo and L. Wen and S. Shen), In IEEE International Conference on Robotics and Automation (ICRA), 2020.
[bib] [code]

Flow-Motion and Depth Network for Monocular Stereo and Beyond (K. Wang and S. Shen), arxiv: 1909.05452, RAL 2020 and to be presented at ICRA 2020.
[pdf] [supplementary pdf] [bib] [code] [2019 arXiv version]

FlowNorm: A Learning-based Method for Increasing Convergence Range of Direct Alignment (K. Wang, K. Wang and S. Shen), arxiv: 1910.07217, ICRA 2020.
[bib] [2019 arXiv version]

Real-time Scalable Dense Surfel Mapping (K. Wang and F. Gao and S. Shen), In IEEE International Conference on Robotics and Automation (ICRA), 2019.
[pdf] [bib] [video] [2019 arXiv version] [code]

An Efficient B-spline-Based Kinodynamic Replanning Framework for Quadrotors (W. Ding and W. Gao and K. Wang and S. Shen), In IEEE Transactions on Robotics (T-RO), 2019.
[pdf] [bib] [video] [2019 arXiv version]

Optimal Trajectory Generation for Quadrotor Teach-and-Repeat (F. Gao and L. Wang and K. Wang and W. Wu and B. Zhou and L. Han and S. Shen), In IEEE Robotics and Automation Letters, 2019.
[pdf] [bib] [video] [code]

MVDepthNet: real-time multiview depth estimation neural network (K. Wang and S. Shen), In International Conference on 3D Vision (3DV), 2018, Oral.
[pdf] [bib] [video] [2018 arXiv version] [code]

Adaptive Baseline Monocular Dense Mapping with Inter-frame Depth Propagation (K. Wang and S. Shen), In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
[pdf] [bib] [video] code]

Quadtree-accelerated Real-time Monocular Dense Mapping (K. Wang and W. Ding and S. Shen), In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
[pdf] [bib] [video] [code]

Probabilistic dense reconstruction from a moving camera (Y. Ling and K. Wang and W. Ding and S. Shen), In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
[pdf] [bib] [video] [code]

Trajectory replanning for quadrotors using kinodynamic search and elastic optimization (W. Ding and W. Gao and K. Wang and S. Shen), In IEEE International Conference on Robotics and Automation (ICRA), 2018.
[pdf] [bib] [video] [2019 arXiv version]

Undergraduate Projects

Spherical Formation Tracking Control of Second-Order Nonlinear Agents With Directed Communication (Y. Chen and K. Wang and Y. Zhang), In 12th IEEE International Conference on Control & Automation (ICCA), 2016.

A geometric extension design for second-order nonlinear agents formation surrounding a sphere (Y. Chen and K. Wang and Y. Zhang and C. Liu and Q. Wang), In Chinese Control and Decision Conference (CCDC), 2016.