Visual tracking consists in integrating fast computer vision and image understanding techniques, together with sequential estimation methodologies (tracking), for the purpose of localizing one or more moving objects in a video sequence.
More or less detailed models can be used for this task, and an efficient integration of all available information greatly enhances the quality of the estimation result.
Model-based visual tracking has application in many fields of interest,
including robotics, man-machine interfaces, video surveillance, computer-assisted surgery,
navigation systems, and so on.
Such tasks require, in turn, the development of dedicated algorithms for tracking: faces and gaze direction,
people, hand postures and gestures, vehicles, cells, robots, etc.
Instead of using pre-defined artificial markers, that most often are not allowed by the application, a model-based approach focuses
on selecting and using natural features available from prior models.
This prior information may consist of shape, appearance, motion and deformation parameters, temporal dynamics,
kinematic structure, as well as any useful information about the sensors and context
that may be specified in advance or refined during the task itself.
Overall, a robust and automatic tracking system has to deal with several, conflicting requirements:
object detection, tracking, handling of full or partial occlusions, maintaining
the identity of targets, detecting lost tracks and re-initializing, being robust to noise, clutter and changing
environment conditions (to some extent), as well as being real-time capable.
Nowadays many different state-of-the-art visual tracking approaches have been developed, covering a wide variety of application scenarios.
Each one follows individual requirements, and benefits from different aspects of the underlying
problem and prior information, while most of the time being specific to a single scenario.
However, despite the huge literature available,
to our knowledge no really general-purpose library has been yet developed for tracking marker-less objects.
In fact, most of the available libraries only provide model-free, image- and feature-level processing, and very few attempts to integrate
Bayesian tracking schemes have been made, usually limited to Kalman filtering
with simple state-space and dynamical models, and without general data association or fusion mechanisms.
This lack can be certainly attributed to the complexity of the problem: a challenging task must be undertaken in order to
integrate in a seamless way multiple, heterogeneous visual modalities (edges, color, texture, motion, etc.),
as well as handling information from multiple sensors, different object models and,
last but not least, considering real-time performance.
With such a target in mind, the Chair for Robotics and Embedded Systems of TUM-Informatik has developed OpenTL (Open Tracking Library),
a general-purpose library for markerless tracking that provides a user-friendly
high-level application programming interface (API) for the widest variety of methods and applications.
OpenTL can handle multiple, simultaneous targets, visual modalities and sensors,
making use of different Bayesian tracking schemes, representing object models of variable complexity,
handling data association and fusion at different levels (pixel-, feature-, state-space),
and providing multi-threading and GPU-accelerated capabilities for real-time efficiency.
OpenTL is implemented in platform-independent C++ language within a hierarchical, object-oriented architecture.
This choice allows a high degree of modularization and abstraction on different layers,
which can be considered an essential property of this framework.
For efficient low-level processing, as well as for internal math representations,
OpenTL is based on the OpenCV (Open Computer Vision) library, as well as on GPU programming concepts (the OpenGL shader language)
for hardware-accelerated processing of generic models.