GSLAM | A general SLAM architecture and benchmark-AI-php.cn

Suddenly discovered a 19-year paper

GSLAM: A General SLAM Framework and Benchmark

Open source code: https://github.com/zdzhaoyong/GSLAM

Go directly to the full text to feel the quality of this work~

1 Abstract

SLAM technology has achieved many successes recently and attracted attracted the attention of high-tech companies. However, how to effectively perform benchmarks on speed, robustness, and portability with interfaces to existing or emerging algorithms remains a problem. In this paper, a new SLAM platform called GSLAM is proposed, which not only provides evaluation functions but also provides researchers with useful tools to quickly develop their own SLAM systems. The core contribution of GSLAM is a universal one. Cross-platform, fully open source SLAM interface designed to handle the interaction of input datasets, SLAM implementations, visualizations and applications in a unified framework. Through this platform, users can implement their own functions in the form of plug-ins to improve the performance of SLAM and further push the application of SLAM into practical applications.

2 Introduction

Simultaneous localization and mapping (SLAM) has been a hot topic in the field of computer vision and robotics since the 1980s. Research Topics. SLAM provides essential functionality for many applications that require real-time navigation, such as robotics, unmanned aerial vehicles (UAVs), autonomous driving, and virtual and augmented reality. In recent years, SLAM technology has developed rapidly, and various SLAM systems have been proposed, including monocular SLAM systems (feature point-based, direct and semi-direct methods), multi-sensor SLAM systems (RGBD, binocular and inertial-assisted methods) and Learning-based SLAM systems (supervised and unsupervised methods).

However, with the rapid development of SLAM technology, almost all researchers focus on the theory and implementation of their own SLAM systems, which makes it difficult to exchange ideas and not easy to implement migration to other systems. This hinders the rapid application of SLAM technology in various industry fields. Furthermore, there are currently many different implementations of SLAM systems, and how to effectively benchmark speed, robustness, and portability remains an issue. Recently, Nardi et al. and Bodin et al. proposed a unified SLAM benchmark system to conduct quantitative, comparable, and verifiable experimental studies and also explore the trade-offs between various SLAM systems. These systems make it easy to conduct evaluation experiments using datasets and metric evaluation modules.

Since existing systems only provide evaluation benchmarks, this paper believes that it is possible to establish a platform to serve the entire life cycle of SLAM algorithms, including development, evaluation, and application stages. In addition, deep learning-based SLAM has made significant progress in recent years, so it is necessary to create a platform that supports not only C but also Python to better support the integration of geometry and deep learning-based SLAM systems. Therefore, in this paper, a novel SLAM platform is introduced that not only provides evaluation capabilities but also provides researchers with useful tools to quickly develop their own SLAM systems. Through this platform, commonly used functions are provided in the form of plug-ins, so users can use them directly or create their own functions for better performance. It is hoped that this platform can further promote the practical application of SLAM systems. In summary, the main contributions of this paper are as follows:

This paper proposes a general, cross-platform, fully open source SLAM platform designed for research and commercial use, surpassing previous benchmarking systems . The SLAM interface consists of multiple lightweight, dependency-free header files, which makes it easy to interact with different data sets, SLAM algorithms and applications in the form of plug-ins in a unified framework. In addition, JavaScript and Python are provided to support web- and deep learning-based SLAM applications. In the GSLAM platform proposed by
, three optimized modules are introduced as utility classes, including Estimator, Optimizer and Vocabulary. Estimator aims to provide a set of closed-form solvers, covering all cases, with strong sample consistency (RANSAC); Optimizer aims to provide a unified interface to popular nonlinear SLM problems; Vocabulary aims to provide efficient and portable vocabulary Bag implementation for multi-threaded and SIMD-optimized place recognition.
Benefiting from the above interfaces, this work implements and evaluates plug-ins for existing datasets, SLAM implementations, and visualization applications in a unified framework, and benchmarks or applications emerging in the future can also be easily further integrated. .

The following first introduces the interface of the GSLAM framework and explains the working principle of GSLAM. Secondly, three practical components are introduced, namely Estimator, Optimizer and Vocabulary. Then, several typical public datasets are used to evaluate different popular SLAM implementations using the GSLAM framework. Finally, we summarize these works and look forward to future research directions.

3 Related Work

Simultaneous Localization And Mapping

SLAM technology is used to build maps in unknown environments , and locate the sensors in the map, focusing mainly on real-time operations. Early SLAM was mainly based on extended Kalman filtering (EKF). The motion parameters of 6 degrees of freedom and 3D landmarks are represented probabilistically as a single state vector. The complexity of classic EKF increases quadratically with the increase in the number of landmarks, limiting its scalability. In recent years, SLAM technology has developed rapidly, and many monocular visual SLAM systems have been proposed, including feature point-based, direct methods and semi-direct methods. However, monocular SLAM systems lack scale information and cannot handle pure rotation situations, so some other multi-sensor SLAM systems, including RGBD, binocular and inertial-assisted methods emerged to improve robustness and accuracy.

Although a large number of SLAM systems have been proposed, there has been little work on unifying the interfaces of these algorithms and no comprehensive comparison of their performance. Furthermore, implementations of these SLAM algorithms are often released as standalone executables rather than libraries, and often do not conform to any standard structure.

Recently, supervised and unsupervised visual odometry (VO) based on deep learning have proposed novel ideas compared with traditional geometry-based methods. However, further optimizing the consistency of multiple keyframes is still not easy. GSLAM provides tools that can help achieve better global consistency. Through this framework, it is easier to visualize or evaluate the results and further apply them to various industry sectors.

Computer Vision and Robotics Platform

In the field of robotics and computers, Robot System (ROS) provides a very convenient communication method between nodes and is favored by most robots. Researchers favor. Many SLAM implementations provide ROS wrappers to subscribe to sensor data and publish visualization results. However, it does not unify the input and output of SLAM implementation, making it difficult to further evaluate different SLAM systems.

Inspired by the ROS message architecture, GSLAM implements a similar inter-process communication utility class called Messenger. This provides an alternative to ROS within the SLAM implementation and maintains compatibility, that is, all ROS-defined messages are supported within the framework and ROS wrappers are implemented naturally. Thanks to the in-process design, messages are delivered without serialization and data transfer, and messages can be sent without delay and additional cost. At the same time, the payload of a message is not limited to ROS-defined messages, but can also be any copyable data structure. Furthermore, not only providing evaluation capabilities, but also providing researchers with useful tools to quickly develop and integrate their own SLAM algorithms.

SLAM Benchmarks

Currently there are several SLAM benchmark systems, including the KITTI benchmark, TUM RGB-D benchmark and ICL-NUIM RGB-D benchmark data set , these systems only provide evaluation functions. In addition, SLAMBench2 extends these benchmarks into algorithms and datasets, requiring users to make published implementations compatible with SLAMBench2 for evaluation, which is difficult to extend to more application areas. Unlike these systems, the GSLAM platform proposed in this paper provides a solution that can serve the entire life cycle of SLAM implementation, from development to evaluation to application. Provides researchers with useful tools to quickly develop their own SLAM systems and further develop visualizations, evaluations and applications based on a unified interface.

4 General SLAM architecture

Framework Overview

The framework of GSLAM is shown in the figure. Overall, the interface is designed to handle the interaction of three parts.

Process the input of SLAM implementation. When running SLAM, sensor data and some parameters are required. For GSLAM, use the Svar class for parameter configuration and command processing. All sensor data required by the SLAM implementation is provided by the Dataset implementation and transmitted using Messenger. GSLAM implements several popular visual SLAM datasets, allowing you to freely implement your own dataset plug-ins.
SLAM implementation. GSLAM treats each implementation as a plugin library. Developers can easily design a SLAM implementation based on the GSLAM interface and utility classes. Developers can also use interfaces to wrap implementations without introducing additional dependencies. Users can focus on the development of core algorithms without caring about the input and output that need to be processed externally to the SLAM implementation.
Visualization part or application using SLAM results. After a SLAM implementation processes an input frame, the user may wish to display or exploit the results. For generality, SLAM results should be published in a standard format. By default, GSLAM uses Qt for visualization, but users are free to implement custom visualization tools and add application plug-ins such as evaluation applications.

GSLAM | 一个通用的SLAM架构和基准

The framework is designed to be compatible with a variety of different types of SLAM implementations, including but not limited to monocular, binocular, RGBD, and multi-camera visual inertial odometry with multi-sensor fusion. Modern deep learning platforms and developers prefer to code in Python, so GSLAM provides Python bindings, enabling developers to implement SLAM in Python and call it using GSLAM, or use Python to call C-based SLAM implementations. Additionally, JavaScript is supported for web-based uses.

Basic Interface Classes

Some data structures commonly used by SLAM interfaces include parameter setting/reading, image format, attitude transformation, camera model and map data structure. The following is a brief introduction to some basic interface classes.

Paramter Setting

GSLAM uses a small parameter parsing and parameter setting class Svar, which contains only one header file, relies on C 11, and has the following characteristics:

a. Parameter parsing, configuration loading and help information. Similar to popular parameter parsing tools such as Google gflags, variable configurations can be loaded from command line arguments, files, and the system environment. Users can also define different types of parameters and provide introductory information, which will be displayed in the help document.

b. A small script language that supports variables, functions and conditional statements to make configuration files more powerful.

c. Thread-safe variable binding and sharing. It is recommended to bind frequently used variables to pointers or references, which not only provides efficiency but also convenience.

d, Simple function definition and calling from C or pure script. Bindings between commands and functions help developers decouple file dependencies.

e. Supports tree structure representation, which means configurations can be easily loaded or saved using XML, JSON and YAML formats.

Intra-Process Messaging

Because ROS provides a very convenient communication method between nodes, it is favored by most robotics researchers. Inspired by the ROS2 message architecture, GSLAM implements a similar inter-process communication utility class called Messenger. This provides an alternative to ROS within the SLAM implementation while maintaining compatibility. Due to its inter-process design, Messenger is able to publish and subscribe to any class at no additional cost. The following is an introduction to more functions:

a. The interface adopts the ROS style, which is easy for users to use. And it supports all ROS-defined messages, which means that it requires very little work to replace the original ROS messaging system.

b. Since there is no serialization and data transfer, messages can be sent without delay and additional cost. At the same time, the payload of a message is not limited to ROS-defined messages, but also supports any copyable data structure.

c. The source code only includes C 11-based header files with no additional dependencies, making it portable.

d.API is thread-safe and supports multi-threaded conditional notifications when the queue size is greater than zero. Before the publisher and subscriber connect to each other, the topic name and RTTI data structure are checked to ensure that they are called correctly.

3D Transforamtion

GSLAM | 一个通用的SLAM架构和基准

##For the rotated part, there are several representation options to choose from , including matrices, Euler angles, unit quaternions and Lie algebra so(3). For a given transformation, any one of them can be used to represent it, and can be converted into each other. However, when considering multiple transformations and manifold optimization, close attention needs to be paid to the chosen representation. The matrix representation is over-parameterized using 9 parameters, while the rotation has only 3 degrees of freedom (DOF). Euler angle representation uses three variables and is easy to understand, but it faces the problem of universal lock and is inconvenient for multiple transformations. Unit quaternions are the most efficient way to perform multiple rotations, while Lie algebras are a common representation for performing popular optimizations.

GSLAM | 一个通用的SLAM架构和基准

Similarly, the Lie algebras se(3) and sim(3) of rigid bodies and similarity transformations are defined. GSLAM uses quaternions to represent the rotation part and provides functions to convert one representation to another. Table 1 shows the transformation implementation and compares it with three other manifold implementations (Sophus, TooN and Ceres). Since the Ceres implementation uses angular axis representation, the exponential and logarithm of the rotation are not required. As shown in the table, GSLAM's implementation performs better because it uses quaternions and has better optimization, while TooN uses a matrix implementation and performs better in terms of point transformations.

GSLAM | 一个通用的SLAM架构和基准

Image format

The storage and transmission of image data is one of the most important functions in visual SLAM. To improve efficiency and convenience, GSLAM uses a data structure GImage, which is compatible with cv::Mat. It has a smart pointer counter to ensure memory is released safely and can be transferred easily without memory copying. Data pointers are aligned for easier Single Instruction Multiple Data (SIMD) acceleration. Users can convert between GImage and cv::Mat seamlessly and safely without memory copying.

Camera Models

GSLAM | 一个通用的SLAM架构和基准

Since SLAM may contain radial and tangential distortion caused by manufacturing imperfections, or by Images captured by fisheye or panoramic cameras, therefore different camera models have been proposed to describe the projection. GSLAM provides implementations including OpenCV (used by ORB-SLAM), ATAN (used by PTAM) and OCamCalib (used by MultiCol-SLAM). Users can also easily inherit these classes and implement other camera models such as Kannala-Brandt and isometric panoramic models.

Map Data Structure

For SLAM implementation, the goal is to locate and generate maps in real time. GSLAM recommends using a unified map data structure consisting of multiple map frames and map points. This data structure is suitable for most existing visual SLAM systems, including feature-based or direct methods.

Map frames are used to represent location status at different times, including various information or estimation results captured by sensors, including IMU or GPS raw data, depth information, and camera models. The SLAM implementation estimates the relationships between them, and the connections between them form a pose graph.

Map points are used to represent the environment observed by frames, typically used by feature-based methods. However, a map point can represent not only a keypoint, but also a GCP (Ground Control Point), edge line, or 3D object. Their correspondence to map frames forms an observation graph, often called a bundle graph.

5 SLAM implementation tool

In order to make it easier to implement a SLAM system, GSLAM provides a utility class. This section will briefly introduce three optimized modules, namely Estimator, Optimizer and Vocabulary.

Estimator

Pure geometric calculations remain a fundamental problem that requires powerful and accurate real-time solutions. Traditional visual SLAM algorithms or modern visual-inertial solutions rely on geometric vision algorithms for initialization, relocation, and loop closure. OpenCV provides multiple geometric algorithms, and Kneip provides a toolbox for geometric vision, OpenGV, which is limited to camera pose calculations. GSLAM's Estimator aims to provide a family of closed-form solvers covering all cases and uses the Robust Random Sampling Consistency Method (RANSAC).

Table 2 lists the algorithms supported by Estimator. Based on the given observational data, they are divided into three categories. 2D-2D matching is used to estimate epipolar or homography constraints, and relative poses can be decomposed from them. 2D-3D corresponds to estimating the central or non-central absolute pose of a monocular or multi-camera system, which is the famous PnP problem. 3D geometry functions such as plane fitting, and estimating SIM transformations of two point clouds are also supported. Most algorithms rely on the open source linear algebra library Eigen, which is a header-only library and available on most platforms.

GSLAM | 一个通用的SLAM架构和基准

Optimizer

Nonlinear optimization is the core part of modern geometric SLAM systems. Due to the high latitude and sparsity of the Hessian matrix, graph structures are used to model the complex estimation problem of SLAM. Several frameworks, including Ceres, G2O, and GTSAM, are proposed to solve general graph optimization problems. These frameworks are widely used in different SLAM systems. ORB-SLAM and SVO use G2O for BA and pose graph optimization. OKVIS and VINS use Ceres for graph optimization with IMU factors, and sliding windows are used to control computational complexity. Forster et al. proposed a visual initialization method based on SVO and used GTSAM to implement the backend.

GSLAM’s Optimizer aims to provide a unified interface for most nonlinear SLAM problems, such as PnP solver, BA, pose graph optimization. A universal plug-in for these problems is implemented based on the Ceres library. For specific problems, such as BA, some more efficient implementations, such as PBA and ICE-BA, are also available as plug-ins. Using the optimizer tool, developers can access different implementations using a unified interface, especially for deep learning-based SLAM systems.

Vocabulary

Place recognition is one of the most important parts of the SLAM system and is used for relocation and loopback detection. The Bag of Words (BoW) method is widely used in SLAM systems because of its efficiency and excellent performance. FabMap proposes a probabilistic method for appearance-based place recognition, which is used in systems such as RSLAM and LSD-SLAM. Since it uses floating-point descriptors like SIFT and SURF, DBoW2 builds a vocabulary tree for training and detection, supporting binary and floating-point descriptors. Refael proposed two improved versions of DBoW2, DBoW3 and FBoW, which simplify the interface and speed up training and loading. Afterwards, ORB-SLAM adopted the ORB descriptor and used DBoW2 for loop detection, relocation and fast matching. Subsequently, a series of SLAM systems, such as ORB-SLAM2, VINS-Mono and LDSO, used DBoW3 for loopback detection. It has become the most popular tool for implementing location recognition in SLAM systems.

Inspired by the above work, GSLAM implemented the DBoW3 vocabulary only with header files, which has the following characteristics:

The dependence on OpenCV is removed, and all functions are in one Relies only on the C++11 header implementation.
Combines the advantages of DBoW2/3 and FBoW, with extremely fast speed and ease of use. Provides a DBoW3-like interface and accelerates binary and floating-point descriptors using SSE and AVX instructions.
Improved memory usage and accelerated loading, saving or training vocabulary and conversion from image features to BoW vectors.

Table 3 shows the comparison of four word bag libraries. In the experiment, each parent node has 10 child nodes, ORB feature detection uses ORB-SLAM, and SIFT detection uses SiftGPU. The ORB vocabulary is used in the implementation results, with levels 4 and 6 respectively, and a SIFT vocabulary. Both FBoW and GSLAM use multi-threading for vocabulary training. GSLAM's implementation outperforms other implementations in almost all projects, including loading and saving vocabularies, training new vocabularies, converting descriptor lists into BoW vectors for place recognition and feature vectors for fast feature matching. Additionally the GSLAM implementation uses less memory and allocates fewer dynamic memory blocks since the main reason DBoW2 requires a lot of memory is fragmentation issues.

GSLAM | 一个通用的SLAM架构和基准

6 SLAM Evaluation Benchmark

Existing benchmarks require users to download the test data set and upload the results To perform an accuracy evaluation, this is not sufficient to unify the operating environment and evaluate a fair performance comparison. Thanks to GSLAM's unified interface, the evaluation of SLAM systems becomes more elegant. With the help of GSLAM, developers can simply upload a SLAM plug-in and perform various evaluations of speed, computational cost, and accuracy in a dockerized environment using fixed resources. In this section, some datasets and implemented SLAM plugins will first be introduced. Then, three representative SLAM implementations are evaluated on speed, accuracy, memory, and CPU usage. This evaluation aims to demonstrate the possibilities of unified SLAM benchmark implementation with different SLAM plugins.

Datasets

Running a SLAM system typically requires sensor data streams and corresponding configuration. In order to allow developers to focus on the development of core SLAM plug-ins, GSLAM provides a standard data set interface, and developers do not need to care about SLAM input. Online sensor input and offline data are provided through different data set plug-ins. The correct plug-in will be dynamically loaded according to the given data set path suffix. The dataset implementation should provide all requested sensor streams and associated configuration, so no additional setup is required for different datasets. All different sensor streams are published through Messenger, using standard topic names and data formats.

GSLAM has implemented several popular visual SLAM dataset plug-ins, as shown in Table 4. Users can also very easily implement a dataset plugin based on the header-only GSLAM core, publish it as a plugin and compile it with the application.

GSLAM | 一个通用的SLAM架构和基准

SLAM Implementations

Figure 2 shows some screenshots of the open source SLAM and SfM plugins running using the built-in Qt visualizer. The framework supports SLAM systems of different architectures, including direct methods, semi-direct methods, feature-based methods, and even SfM methods. DSO implementations need to publish results such as point clouds, camera poses, trajectories, and pose maps for visualization like ROS-based implementations. Users can use a unified framework to access different SLAM plug-ins, and it is very convenient to develop SLAM-based applications based on C, Python and Node-JS interfaces. Since many researchers use ROS in development, GSLAM also provides a ROS visualization plug-in to seamlessly transmit ROS-defined messages and enable developers to leverage Rviz for display or continue developing other ROS-based applications.

GSLAM | 一个通用的SLAM架构和基准

Evaluation

Since most existing benchmarks only provide datasets or do not have groundtruth for users to conduct their own evaluations, GSLAM provides a built-in plug-in and some script tools for Computational performance and accuracy evaluation.

The paper uses the sequence nostructure-texture-near-withloop in the TUM RGBD data set to demonstrate the execution of the evaluation. The following experiments use three open source monocular SLAM plug-ins DSO, SVO and ORB-SLAM. In all experiments, a computer with i7-6700 CPU, GTX 1060 GPU and 16GB RAM running 64-bit Ubuntu 16.04 was used.

Computational performance evaluation includes memory usage, number of allocated memory blocks, CPU usage and statistics of each frame, as shown in Figure 3. The results show that SVO uses the least memory and CPU resources and achieves the fastest speed. And since SVO is just a visual odometer and only maintains a local map inside the implementation, the cost remains stable. DSO allocates fewer memory blocks, but consumes more than 100MB of memory and grows slowly. One problem with DSO is that the processing time increases dramatically when the number of frames drops below 500, in addition, keyframes take even longer to process. ORB-SLAM uses the most CPU resources, the calculation time is stable, but the memory usage increases rapidly, and it allocates and releases a large number of memory blocks because its BA uses the G2O library and does not use the incremental optimization method.

GSLAM | 一个通用的SLAM架构和基准

Figure 4 shows the evaluation results of the odometry trajectory. As shown in the figure, SVO is faster but has larger drift, while ORBSLAM achieves the highest accuracy in terms of absolute attitude error (APE). Since the comprehensive evaluation is a pluggable plug-in application, more evaluation metrics such as point cloud accuracy can be re-implemented.

GSLAM | 一个通用的SLAM架构和基准

7 Summary

This article introduces a new general-purpose SLAM platform called GSLAM, which Support from development, evaluation to application is presented. Through this platform, commonly used toolkits are provided in the form of plug-ins, and users can also easily develop their own modules. To make the platform easy to use, make the interface only dependent on C++11. In addition, Python and JavaScript interfaces are provided to better integrate traditional SLAM and deep learning-based SLAM, or perform distributed operations on the Web.

In the following research, more SLAM implementations, documents and demonstration codes will be provided for easy learning and use. In addition, the integration of traditional SLAM and deep learning-based SLAM will be provided to further explore the unknown possibilities of SLAM systems.

The homepage of this work is as follows:

GSLAM: Main Page

It feels like a framework for learning the principles of each part of SLAM~

GSLAM | 一个通用的SLAM架构和基准

Original link: https://mp.weixin.qq.com/s/PCxhqhK3t1soN5FI0w9NFw

The above is the detailed content of GSLAM | A general SLAM architecture and benchmark. For more information, please follow other related articles on the PHP Chinese website!