OpenTL

Intro: model-based visual tracking

Visual tracking consists in integrating fast computer vision and image understanding techniques, together with sequential estimation methodologies (tracking), for the purpose of localizing one or more moving objects in a video sequence. More or less detailed models can be used for this task, and an efficient integration of all available information greatly enhances the quality of the estimation result.

Model-based visual tracking has application in many fields of interest, including robotics, man-machine interfaces, video surveillance, computer-assisted surgery, navigation systems, and so on.
Such tasks require, in turn, the development of dedicated algorithms for tracking: faces and gaze direction, people, hand postures and gestures, vehicles, cells, robots, etc.

Instead of using pre-defined artificial markers, that most often are not allowed by the application, a model-based approach focuses on selecting and using natural features available from prior models.
This prior information may consist of shape, appearance, motion and deformation parameters, temporal dynamics, kinematic structure, as well as any useful information about the sensors and context that may be specified in advance or refined during the task itself.

Overall, a robust and automatic tracking system has to deal with several, conflicting requirements: object detection, tracking, handling of full or partial occlusions, maintaining the identity of targets, detecting lost tracks and re-initializing, being robust to noise, clutter and changing environment conditions (to some extent), as well as being real-time capable.

The OpenTL library

Nowadays many different state-of-the-art visual tracking approaches have been developed, covering a wide variety of application scenarios. Each one follows individual requirements, and benefits from different aspects of the underlying problem and prior information, while most of the time being specific to a single scenario.

However, despite the huge literature available, to our knowledge no really general-purpose library has been yet developed for tracking marker-less objects.

In fact, most of the available libraries only provide model-free, image- and feature-level processing, and very few attempts to integrate Bayesian tracking schemes have been made, usually limited to Kalman filtering with simple state-space and dynamical models, and without general data association or fusion mechanisms.

This lack can be certainly attributed to the complexity of the problem: a challenging task must be undertaken in order to integrate in a seamless way multiple, heterogeneous visual modalities (edges, color, texture, motion, etc.), as well as handling information from multiple sensors, different object models and, last but not least, considering real-time performance.

With such a target in mind, the Chair for Robotics and Embedded Systems of TUM-Informatik has developed OpenTL (Open Tracking Library), a general-purpose library for markerless tracking that provides a user-friendly high-level application programming interface (API) for the widest variety of methods and applications.

OpenTL can handle multiple, simultaneous targets, visual modalities and sensors, making use of different Bayesian tracking schemes, representing object models of variable complexity, handling data association and fusion at different levels (pixel-, feature-, state-space), and providing multi-threading and GPU-accelerated capabilities for real-time efficiency.

OpenTL is implemented in platform-independent C++ language within a hierarchical, object-oriented architecture.
This choice allows a high degree of modularization and abstraction on different layers, which can be considered an essential property of this framework.

For efficient low-level processing, as well as for internal math representations, OpenTL is based on the OpenCV (Open Computer Vision) library, as well as on GPU programming concepts (the OpenGL shader language) for hardware-accelerated processing of generic models.

Main features of OpenTL

Modularized, object-oriented software architecture
Common abstractions inside each layer: Bayesian filters, visual modalities, pose representations, data storage, image processing etc.
Realtime performance: optimized algorithms, with support for multi-core and GPU processing
Different Bayesian filters, including particle filters (SIR, MCMC) as well as Kalman-based filters (EKF, IF)
A large variety of visual modalities: for example color statistics, background subtraction, Contours, local keypoints, intensity gradients, templates, blobs, etc.
Robust improvements: data fusion (static, dynamic), models combining online and offline data, color space conversions, etc.
Articulated objects
General-purpose XML parser for 3D models
Support for different camera types: USB webcams, FireWire cameras, GigE Vision-based cameras
Multiple platforms: currently Microsoft Windows (XP, Vista) and Linux (Ubuntu Hardy)
Generic sensor abstraction: camera, lidar, radar, gps, video file
Combining multi-camera, multi-target and multi-modality support

Bibliographical references

G. Panin, E. Roth, T. Röder, S. Nair, C. Lenz, M. Wojtczyk, T. Friedlhuber, and A. Knoll, ITrackU: An integrated framework for image-based tracking and understanding, in Proceedings of the International Workshop on Cognition for Technical Systems, Munich, Germany, Oct. 2008.
G. Panin, C. Lenz, S. Nair, E. Roth, M. Wojtczyk, T. Friedlhuber, and A. Knoll, A unifying software architecture for model-based visual tracking, in IS&T/SPIE 20th Annual Symposium of Electronic Imaging, San Jose, CA, Jan. 2008.
G. Panin, S. Klose, and A. Knoll, Real-time articulated hand detection and pose estimation, in International Symposium on Visual Computing (ISVC), Las Vegas, Nevada, USA, Dec. 2009, to appear.
G. Panin, S. Klose, and A. Knoll, Multi-target and multi-camera object detection with monte-carlo sampling, in International Symposium on Visual Computing (ISVC), Las Vegas, Nevada, USA, Dec. 2009, to appear.
C. Lenz, G. Panin, T. Röder, M. Wojtczyk, and A. Knoll, Hardware-assisted multiple object tracking for human-robot-interaction, in HRI 2009: Proceedings of the 4th ACM/IEEE international conference on Human-robot interaction, F. Michaud, M. Scheutz, P. Hinds, and B. Scassellati, Eds. La Jolla, CA, USA: ACM, Mar. 2009, pp. 283–284.
C. Lenz, G. Panin, and A. Knoll, A GPU-accelerated particle filter with pixel-level likelihood, in International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, Oct. 2008.
G. Panin, E. Roth, and A. Knoll, Robust contour-based object tracking integrating color and edge likelihoods, in International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, Oct. 2008.
E. Roth, G. Panin, and A. Knoll, Sampling feature points for contour tracking with graphics hardware, in International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, Oct. 2008.
G. Panin, T. Röder, and A. Knoll, Integrating robust likelihoods with monte-carlo filters for multi-target tracking, in International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, Oct. 2008.
T. Müller, C. Lenz, S. Barner, and A. Knoll, Accelerating integral histograms using an adaptive approach, in Proceedings of the 3rd International Conference on Image and Signal Processing, ser. Lecture Notes in Computer Science (LNCS). Cherbourg-Octeville, France: Springer, July 2008, pp. 209–217.
G. Panin, A. Knoll, Mutual information-based 3d object tracking, International Journal of Computer Vision, 78(1):107-118, January 2008.

OpenTLA general-purpose tracking library

Intro: model-based visual tracking

The OpenTL library

Main features of OpenTL

Bibliographical references

OpenTL
A general-purpose tracking library