VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping (2024)

research-article

Free Access

Authors:
DaoQing Liao https://ror.org/0530pts50School of Automation Science and Engineering, South China University of Technology, Guangzhou, China

https://ror.org/0530pts50School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Search about this author

,
Wei Ai https://ror.org/0530pts50School of Automation Science and Engineering, South China University of Technology, Guangzhou, China

https://ror.org/0530pts50School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Search about this author

Journal of Real-Time Image ProcessingVolume 21Issue 2Apr 2024https://doi.org/10.1007/s11554-023-01412-6

Published:09 February 2024Publication History

0citation
0
Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
Publisher Site

Journal of Real-Time Image Processing

Volume 21, Issue 2

PreviousArticleNextArticle

Skip Abstract Section

Abstract

In numerous robotic and autonomous driving tasks, traditional visual SLAM algorithms estimate the camera’s position in a scene through sparse feature points and express the map by estimating the depth of sparse point clouds. However, practical applications require SLAM to create dense maps in real time, overcoming the sparsity and occlusion issues of point clouds. Furthermore, it is advantageous for SLAM map to possess an auto-completion capability, where the map can automatically infer and complete the remaining 20% when the camera observes only 80% of an object. Therefore, a more dense and intelligent map representation is needed. In this paper, we propose a Visual–Inertial SLAM with Neural Radiance Fields reconstruction to address the aforementioned challenges. We integrate the traditional rule-based optimization with NeRF. This approach allows for the real-time update of NeRF local functions by rapidly estimating camera motion and sparse feature point depths to reconstruct 3D scenes. To achieve better camera poses and globally consistent map, we address the issue of IMU noise spikes resulting from rapid motion changes, along with handling pose adjustments due to loop closure fusion. Specifically, we employ a form of widening the static noise covariance to refit the dynamic noise covariance. During loop closure fusion, we treat the pose adjustment between pre- and post-loop closure as a spatiotemporal transformation, migrating NeRF parameters from pre- to post- to expedite loop closure adjustments in NeRF mapping. Moreover, we extend this method to scenarios with only grayscale images. By expanding the color channels of grayscale images and conducting linear spatial mapping, we can rapidly reconstruct 3D scenes with only grayscale images. We demonstrate the precision and speed advantages of our method in both RGB and grayscale scenes.

References

1. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. ICCV (2021)Google Scholar
2. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. CVPR (2022)Google Scholar
3. Bhalgat, Y., Laina, I., Henriques, J.F., Zisserman, A., Vedaldi, A.: Contrastive lift: 3D object instance segmentation by slow-fast contrastive fusion. Preprint arXiv:2306.04633 (2023)Google Scholar
4. Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M.W., Siegwart, R.: The Euroc micro aerial vehicle datasets. Int. J. Robot. Res. (2016). DOI: https://doi.org/10.1177/0278364915620033. https://ijr.sagepub.com/content/early/2016/01/21/0278364915620033.abstractGoogle ScholarDigital Library
5. Campos CElvira RRodríguez JJGMontiel JMTardós JDOrb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slamIEEE Trans. Rob.20213761874189010.1109/TRO.2021.3075644Google ScholarCross Ref
6. Chen, A., Xu, Z., Zhao, F., Zhang, X., Xiang, F., Yu, J., Su, H.: Mvsnerf: fast generalizable radiance field reconstruction from multi-view stereo, pp. 14124–14133 (2021)Google Scholar
7. Chen, Z.: Im-net: learning implicit fields for generative shape modeling (2019)Google Scholar
8. Chung, C.M., Tseng, Y.C., Hsu, Y.C., Shi, X.Q., Hua, Y.H., Yeh, J.F., Chen, W.C., Chen, Y.T., Hsu, W.H.: Orbeez-slam: a real-time monocular visual slam with orb features and nerf-realized mapping. Preprint arXiv:2209.13274 (2022)Google Scholar
9. Clark, R.: Volumetric bundle adjustment for online photorealistic scene capture, pp. 6124–6132 (2022)Google Scholar
10. Crassidis JLSigma-point Kalman filtering for integrated GPS and inertial navigationIEEE Trans. Aerosp. Electron. Syst.200642275075610.1109/TAES.2006.1642588Google ScholarCross Ref
11. Dai ANießner MZollhöfer MIzadi STheobalt CBundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegrationACM Trans Graph (ToG)2017364110.1145/3072959.3054739Google ScholarDigital Library
12. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)Google Scholar
13. Forster CCarlone LDellaert FScaramuzza DOn-manifold preintegration for real-time visual-inertial odometryIEEE Trans. Rob.201633112110.1109/TRO.2016.2597321Google ScholarDigital Library
14. Godard, C., MacAodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency, pp. 270–279 (2017)Google Scholar
15. Godard, C., MacAodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation, pp. 3828–3838 (2019)Google Scholar
16. Koestler, L., Yang, N., Zeller, N., Cremers, D.: Tandem: tracking and dense mapping in real-time using deep multi-view stereo, pp. 34–45 (2022)Google Scholar
17. Leutenegger SFurgale PRabaud VChli MKonolige KSiegwart RKeyframe-based visual-inertial slam using nonlinear optimizationProc. Robot. Sci. Syst. (RSS)201320131Google Scholar
18. Leutenegger SLynen SBosse MSiegwart RFurgale PKeyframe-based visual–inertial odometry using nonlinear optimizationInt. J. Robot. Res.201534331433410.1177/0278364914554813Google ScholarDigital Library
19. Li, J., Feng, Z., She, Q., Ding, H., Wang, C., Lee, G.H.: Mine: Towards continuous depth MPI with nerf for novel view synthesis, pp. 12578–12588 (2021)Google Scholar
20. Li MMourikis AIHigh-precision, consistent EKF-based visual-inertial odometryInt. J. Robot. Res.201332669071110.1177/0278364913481251Google ScholarDigital Library
21. Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4273–4284 (2023)Google Scholar
22. Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: bundle-adjusting neural radiance fields (2021)Google Scholar
23. Lindenberger, P., Sarlin, P.E., Pollefeys, M.: Lightglue: local feature matching at light speed. Preprint arXiv:2306.13643 (2023)Google Scholar
24. Lupton, T., Sukkarieh, S.: Visual–inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Trans. Robot. (2011). DOI: https://doi.org/10.1109/tro.2011.2170332. http://dx.doi.org/10.1109/tro.2011.2170332Google ScholarDigital Library
25. Martin-Brualla, R., Radwan, N., Sajjadi, M.S.M., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections (2021)Google Scholar
26. Meng, X., Chen, W., Yang, B.: Neat: learning neural implicit surfaces with arbitrary topologies from multi-view images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 248–258 (2023)Google Scholar
27. Mildenhall BSrinivasan PPTancik MBarron JTRamamoorthi RNg RNeRF: representing scenes as neural radiance fields for view synthesisCommun. ACM20216519910610.1145/3503250Google ScholarDigital Library
28. Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation, pp. 3565–3572 (2007)Google Scholar
29. Müller TEvans ASchied CKeller AInstant neural graphics primitives with a multiresolution hash encodingACM Trans. Graph. (ToG)202241411510.1145/3528223.3530127Google ScholarDigital Library
30. Ortiz, J., Clegg, A., Dong, J., Sucar, E., Novotny, D., Zollhoefer, M., Mukadam, M.: isdf: Real-time neural signed distance fields for robot perception. Preprint arXiv:2204.02296 (2022)Google Scholar
31. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation, pp. 165–174 (2019)Google Scholar
32. Paul, M.K., Roumeliotis, S.I.: Alternating-stereo vins: observability analysis and performance evaluation, pp. 4729–4737 (2018)Google Scholar
33. Paul, M.K., Wu, K., Hesch, J.A., Nerurkar, E.D., Roumeliotis, S.I.: A comparative analysis of tightly-coupled monocular, binocular, and stereo vins, pp. 165–172 (2017)Google Scholar
34. Prisacariu, V.A., Kähler, O., Golodetz, S., Sapienza, M., Cavallari, T., Torr, P.H., Murray, D.W.: Infinitam v3: A framework for large-scale 3d reconstruction with loop closure. arXiv preprint arXiv:1708.00783 (2017)Google Scholar
35. Qin TLi PShen SVins-mono: a robust and versatile monocular visual-inertial state estimatorIEEE Trans. Rob.20183441004102010.1109/TRO.2018.2853729Google ScholarDigital Library
36. Qin, T., Pan, J., Cao, S., Shen, S.: A general optimization-based framework for local odometry estimation with multiple sensors. Preprint arXiv:1901.03638 (2019)Google Scholar
37. Rosinol, A., Leonard, J.J., Carlone, L.: NeRF-SLAM: real-time dense monocular slam with neural radiance fields. Preprint arXiv:2210.13641 (2022)Google Scholar
38. Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., etal.: The replica dataset: a digital replica of indoor spaces. Preprint arXiv:1906.05797 (2019)Google Scholar
39. Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: implicit mapping and positioning in real-time, pp. 6229–6238 (2021)Google Scholar
40. Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. CVPR (2022)Google Scholar
41. Teed ZDeng JDroid-slam: deep visual slam for monocular, stereo, and RGB-D camerasAdv. Neural. Inf. Process. Syst.2021341655816569Google Scholar
42. Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: reconstruction and novel view synthesis of a dynamic scene from monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12959–12970 (2021)Google Scholar
43. Wang, P., Liu, Y., Chen, Z., Liu, L., Liu, Z., Komura, T., Theobalt, C., Wang, W.: F $2$ -nerf: fast neural radiance field training with free camera trajectories. Preprint arXiv:2303.15951 (2023)Google Scholar
44. Wang, Y., Han, Q., Habermann, M., Daniilidis, K., Theobalt, C., Liu, L.: Neus2: fast learning of neural implicit surfaces for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3295–3306 (2023)Google Scholar
45. Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: Dense slam without a pose graph (2015)Google Scholar
46. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: inverting neural radiance fields for pose estimation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)Google Scholar
47. Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. Preprint arXiv:2112.05131 (2021)Google Scholar
48. Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)Google Scholar
49. Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., Oswald, M.R., Pollefeys, M.: Nice-slam: neural implicit scalable encoding for slam, pp. 12786–12796 (2022)Google Scholar

Cited By

View all

Recommendations

Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM
Monocular SLAM has the potential to turn inexpensive cameras into powerful pose sensors for applications such as robotics and augmented reality. We present a relocalization module for such systems which solves some of the problems encountered by ...
Read More
Instant Outdoor Localization and SLAM Initialization from 2.5D Maps
We present a method for large-scale geo-localization and global tracking of mobile devices in urban outdoor environments. In contrast to existing methods, we instantaneously initialize and globally register a SLAM map by localizing the first keyframe with ...
Read More
Real-time Omnidirectional Visual SLAM with Semi-Dense Mapping
2018 IEEE Intelligent Vehicles Symposium (IV)
The state of art Visual SLAM is going from sparse feature to semi-dense feature to provide more information for environment perception, whereas the semi-dense methods often suffer from inaccurate depth map estimation and are easy to become instable for ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Information
Contributors

Published in

Journal of Real-Time Image Processing Volume 21, Issue 2
Apr 2024
529 pages
ISSN:1861-8200
Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Sponsors
In-Cooperation
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
- Published: 9 February 2024
- Accepted: 30 December 2023
- Received: 5 December 2023
Author Tags
NeRF
SLAM
Intelligent map
Real-time online algorithm
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

Digital Edition

View this article in digital edition.

View Digital Edition

Figures
Other

Caption

View Issue’s Table of Contents

VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping (2024)

New Citation Alert added!

New Citation Alert!

Journal of Real-Time Image Processing

Abstract

Abstract

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations