Posts by Category

TecniqueReport

Improve the robustness of TSDF-based SLAM

less than 1 minute read

The research was done in a five months project. The main goal of this project is to build a SLAM system which is robust to tracking-loss, drifting and able to perform detail object-level reconstruction. In the period I used the sliding windows and keyframes methods [1] to handle the tracking loss and relocalization respectively. Moreover, an object centroid tracking system was added to allow users to take certain object during reconstruction, so as to reconstruct the whole model (for example, the bottom part).

Learning to assign Local Reference Frame for 3D Point Cloud

2 minute read

The idea of this research is to assign a local reference frame to the target point cloud before doing feature description, so as to get a descriptor which is invariance to rotations. The concept has been widely applied in some famous conventional 3D feature descriptors, e.g. SHOT[2], PS[3], USC[4], EM[5] and MeshHog[6]. As well as learning based method, e.g. CGF[7]. As well as in 2D features SIFT[1].

publications

Incremental 3D Semantic Scene Graph Prediction from RGB Sequences Permalink

1 minute read

3D semantic scene graphs are a powerful holistic representation as they describe the individual objects and depict the relation between them. They are compact high-level graphs that enable many tasks requiring scene reasoning. In real-world settings, existing 3D estimation methods produce robust predictions only in conjunction with dense inputs. In this work, we propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network. The proposed pipeline simultaneously reconstructs a sparse point map and fuses entity estimation from the input images. The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities. Extensive experiments on the 3RScan dataset show the effectiveness of the proposed method in this challenging task, outperforming state-of-theart approaches.

Towards Long-Term Retrieval-based Visual Localization in Indoor Environments with Changes Permalink

1 minute read

Visual localization is a challenging task due to the presence of illumination changes, occlusion, and perception from novel viewpoints. Re-localizing the camera pose in long-term setups raises difficulties caused by changes in scene appearance and geometry introduced by human or natural deterioration. Many existing methods use static scene assumptions and fail in dynamic indoor scenes. Only a few works handle scene changes by introducing outlier awareness with pure learning methods. Other recent approaches use semantics to robustify camera localization in changing setups. However, to the best of our knowledge, no method has yet used scene graphs in feature-based approaches to introduce change awareness. In this work, we propose a novel feature-based camera re-localization method that leverages scene graphs within retrieval and feature detection and matching. Semantic scene graphs are used to estimate scene changes by matching instances and relationship triplets. The knowledge of scene changes is then used for our change-aware image retrieval and feature correspondence verification. We show the potential of integrating higher-level knowledge about the scene within a retrieval-based localization pipeline. Our method is evaluated on the RIO10 benchmark with comprehensive evaluations on different levels of scene changes.

Bending Graphs: Hierarchical Shape Matching using Gated Optimal Transport Permalink

less than 1 minute read

Shape matching has been a long-studied problem for the computer graphics and vision community. The objective is to predict a dense correspondence between meshes that have a certain degree of deformation. Existing methods either consider the local description of sampled points or discover correspondences based on global shape information. In this work, we investigate a hierarchical learning design, to which we incorporate local patch-level information and global shape-level structures. This flexible representation enables correspondence prediction and provides rich features for the matching stage. Finally, we propose a novel optimal transport solver by recurrently updating features on non-confident nodes to learn globally consistent correspondences between the shapes. Our results on publicly available datasets suggest robust performance in presence of severe deformations without the need for extensive training or refinement.

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences Permalink

1 minute read

We create a globally consistent 3D scene graph by fusing predictions of a graph neural network (GNN) from an incremental geometric segmentation created from an RGB-D sequence. Our method merges nodes on the same object instance and naturally grows and improves over time when new segments and surfaces are discovered. As a by-product, our method produces accurate panoptic segmentation of large-scale 3D scans. The nodes represent the different object segments.

SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion Permalink

less than 1 minute read

Real-time scene reconstruction from depth data inevitably suffers from occlusion, thus leading to incomplete 3D models. Partial reconstructions, in turn, limit the performance of algorithms that leverage them for applications in the context of, e.g., augmented reality, robotic navigation, and 3D mapping. Most methods address this issue by predicting the missing geometry as an offline optimization, thus being incompatible with real-time applications. We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model. We evaluate the proposed approach quantitatively and qualitatively, demonstrating that our method can obtain accurate 3D semantic scene completion in real-time..