3d models from a webcam

University of Cambridge researcher Qi Pan has developed a simple and affordable method for constructing virtual three-dimensional (3D) models. First presented at the 20th British Machine Vision Conference in London in 2009 the system may make 3D modeling more accessible in the future. A scientific publication describes details of the work.

The interesting part of this research, and this is the reason why I blog about this, is that the paper describes a complete system which runs in real-time and provides useful interactive user feedback. It’s rather seldom that researchers take the long road of building a user interface.

The system requires only a basic Web camera to capture video from which 3D models of textured objects are reconstructed in real-time. A partial model is built as the user moves the object around, providing immediate feedback about the state of reconstruction. As the object is moved the software detects points on the object, uses them to estimate object structure and reconstructs geometry. For visualisation the object’s texture is then extracted from video keyframes and applied to the 3D mesh which produces a realistically looking model.

Internally, the system uses a point tracker cascade constructed from two susbsequent point trackers. First, a robust point tracker utilizing FAST features and PROSAC otimization to obtain a robust estimation of 3D landmarks even under large motion. Then, starting at the pose obtained from the first tracker a second drift-free tracker refines the 3D landmark matches by minimising matching error across three temporally neighbouring keyframes and finally using a robust M-estimator with Tukey influence function to minimise the camera pose estimation error. New 3D landmarks on newly discovered and yet unmodeled sides of the object are initially estimated with a 2D tracker based on the epipolar constraint. Keyframes are captured at the start and when object rotation of more than 10 degrees is detected. For 3D model reconstruction the system computes the Delaunay tetrahedralization of 3D landmarks and estimated camera poses. First, a point cloud is constructed from landmarks and camera poses using bundle adjustment. Then the point cloud is converted into a mesh by Delaunay tetrahedralization. This method only covers the convex hull of the point cloud and thus non-convexities (e.g. folds in clothing or the inside of a bowl) cannot be modeled. As the object is moved the tetrahedra are carved away from the model to obtain the true surface. The object’s texture is then extracted from individual video keyframes and applied to the 3D mesh to produce a realistically looking appearance of the model.

The system uses a Logitech Quickcam 9000 at 640×480 @15fps resolution and creates a 3D model on an Intel 2.4Ghz in less than 3 sec plus approx 60 sec for capturing video with a sufficient amount of keyframes from all sides of the model.