China has a long history, profound cultural heritage, and a large number of cultural relics. As the crystallization of the wisdom of previous generations, the value of cultural relics as documents is self-evident. Ancient books are an important carrier of recording Chinese civilization and a precious cultural heritage that has been handed down to this day. The protection of cultural relics is also an important long-term basic work. More than 2,800 libraries across the country collect more than 50 million ancient books, one-third of which are damaged to varying degrees. According to the existing number of cultural relic restoration personnel, it will take hundreds of years to restore all the cultural relics in the collection.
"Travel Notes on Ancient Books" is jointly created by ByteDance and China's First Historical Archives, Dunhuang Research Institute, Gansu Slip Museum, and the National Library (National Museum of Classical Books) Ancient books revitalization project, restore the four major discoveries of ancient documents - oracle bones from Yin Ruins, Juyan Han bamboo slips, Dunhuang posthumous writings, and Ming and Qing archives, allow ancient books to be digitalized "Live".
This project is centered on VR interactive documentary, relying on the latest three-dimensional reconstruction technology of Volcano Engine Multimedia Laboratory, to recreate offline cultural relics into the PICO virtual scene, It also applies self-developed light field video technology to collect and vividly restore the light field information of dynamic characters, providing a high degree of freedom of viewing and interactive experience in VR scenes. In these documentaries, viewers can use PICO, Douyin naked-eye VR and other methods to travel through time and space without leaving home, personally participate in historical events, and have close contact with and appreciate ancient books.
This article focuses on the three-dimensional reconstruction technology of the Volcano Engine Multimedia Laboratory and the principles, advancement and application fields of light field video technology to help everyone better understand and understand the three-dimensional reconstruction technology and help related technologies to be implemented in practice. products and applications.
The digitization of cultural relics requires three-dimensional reconstruction and digital restoration of cultural relics. At the same time, it also poses a great challenge to the three-dimensional reconstruction technology:
Three-dimensional reconstruction is a combination of computer-aided geometric design (CAGD), computer graphics (CG), computer animation, computer vision, medical image processing, science Common scientific issues and core technologies in computing, virtual reality, digital media creation and other fields. Three-dimensional reconstruction technology generally includes steps such as data collection , preprocessing, point cloud splicing, feature analysis, mesh and texture generation, etc.
Traditional three-dimensional reconstruction uses the process of reconstructing three-dimensional information of images based on vision or multi-modality (depth data, e.g., laser), which can model static objects and scenes, but There is a lack of effective overall solution for dynamic object and scene modeling.
The Volcano Engine Multimedia Laboratory has self-developed object reconstruction technology, scene reconstruction technology, and light field video technology, which can construct high-fidelity forms of static objects and restore their complex materials ; Being able to effectively model large scenes, including cities, parks, housing spaces, etc., is an important foundation for digital twins; and it can use advanced light field video technology to reconstruct and reproduce dynamic objects and dynamic scenes to achieve on-demand viewing and live broadcast, with a complete set of technical solutions.
In the "Travel Notes on Ancient Books" project, the Volcano Engine Multimedia Laboratory has made digital restorations of more than 40 cultural relics. In the process of digital restoration of cultural relics, the first difficulty encountered is that cultural relics need to be protected, and there are certain restrictions on collection equipment. For example, commonly used high-precision laser equipment cannot be used to scan cultural relics. This drove the Volcano Engine Multimedia Laboratory team to use a vision-based method to conduct three-dimensional reconstruction of cultural relics.
However, traditional vision-based reconstruction methods cannot handle weakly textured objects, and it is also difficult to reconstruct items with complex shapes (such as long and narrow slippers, flat oracle bones). To this end, the technical solution of Signed Distance Fields (SDF for short) is used to represent three-dimensional objects, combined with the deep learning method to overcome the above reconstruction difficulties. SDF represents the directional distance from each point in space to the object. It is an implicit representation. The schematic diagram of the two-dimensional SDF is as follows.
SDF diagram
How to supervise the neural network to accurately fit the SDF needs to be studied The problem. First use the Structure from Motion (SfM) algorithm to accurately calculate the camera posture of the captured image. With the camera posture, the spatial information represented by the SDF is rendered onto the image using the differentiable rendering method, and the rendered image is compared with the image collected from that perspective, and the neural network is continuously optimized to make the SDF better at each collection perspective. The rendering results are as consistent as possible with the actual captured images.
In order to further improve the precision of reconstruction, the three-dimensional points obtained by sparse reconstruction are added as constraints when optimizing SDF, which can better restore the detailed features of the object. In order to achieve the purpose of complete reconstruction, the Volcano Engine Multimedia Laboratory also combines segmentation algorithms and reconstruction algorithms to effectively reconstruct the bottom area of the object.
Since the object needs to be fixed in a certain position during the scanning process, no picture can be collected from the bottom surface of the object. The complete reconstruction of an object is to solve the problem of reconstruction of the bottom of the object. The usual method is the suspension method or multi-segment reconstruction plus post-processing splicing. The suspension method is not safe enough for cultural relics, and the post-splicing processing process is long and cannot be automated. To this end, the Volcano Engine Multimedia Laboratory has added automated image segmentation to the reconstruction algorithm, which can unify the data taken from the forward and reverse shots and reconstruct them together, directly obtaining the complete reconstruction results, and compare the results of the complete reconstruction As shown below.
Modeling results without using full reconstruction technique
Modeling results using complete reconstruction technology
Highlight is a major challenge for object reconstruction. On the one hand, highlight affects feature point matching, resulting in inaccurate camera poses. , another highlight will also destroy the consistency of the observation results between different viewing angles, causing interference to the reconstruction. To this end, Volcano Engine Multimedia Laboratory has summarized a set of methods to eliminate highlights using polarized light, which can effectively remove a large number of highlights. The comparison of highlight elimination results is shown in the figure below.
Before removing highlights
##After eliminating highlights
The method of the Volcano Engine Multimedia Laboratory can also simulate thereflection/refraction properties of different objects to realize the construction of special material objects. Model , The results of cultural relic reconstruction are shown in the figure below.
Original picture of cultural relics
# #Cultural Relics Reconstruction Result
Some of the cultural relics in the four major museums are precious paper and bamboo slips. These cultural relics are also difficult to take out and collect from the display cabinets. In response to this situation, Volcano Engine Multimedia Laboratory has self-developed collection equipment incorporating optical polarizers, which can eliminate stray light, highlights and reflection problems caused by glass display cabinets, allowing us to have a layer of glass protection Even in the state of their shells, the cultural relics are still scanned and reconstructed with high fidelity.
Cultural relics in glass display cabinet
Cultural relic reconstruction results
In addition, the object reconstruction technology of the Volcano Engine Multimedia Laboratory also includes Accurate pose estimation, realistic textures ( Diffuse reflection, specular reflection, translucence ) The restoration of complex materials and the reconstruction of fine surfaces are also in " It is used in the "Travel Notes of Ancient Books" project to achieve a high-fidelity 1:1 restoration of precious cultural relics and convert them into digital resources, allowing the audience to "immerse" the museum and make the collection more popular.
The object reconstruction technology of Volcano Engine Multimedia Laboratory has strong universality. It is not only suitable for cultural relics, but also for general objects. It is also suitable for some objects that are difficult to be processed by traditional reconstruction, such as very thin blades. Objects, etc., can also achieve good reconstruction results.
## Top: props such as knives and wooden sticks; bottom: e-commerce items
2.2 Self-built scene reconstruction algorithm: higher efficiency, higher accuracy Scene reconstruction is an important research topic in computer vision and photogrammetry, and is also in smart cities, virtual reality, digital navigation and digital heritage protection. has important applications. Three-dimensional reconstruction through vision has the advantages of high acquisition efficiency, low acquisition cost, high upper limit of accuracy, and adaptability to a wide range of scenes. It can also avoid unnecessary damage to the scene caused by other scanning equipment, but it faces many challenges at the algorithm level.In this regard, The Volcano Engine Multimedia Laboratory combines AI technology and the basic principles of multi-view geometry to build an advanced robust, accurate and complete visual reconstruction algorithm framework. The reconstruction process includes three key steps : image processing, Point cloud optimization and mesh reconstruction .
The Volcano Engine Multimedia Laboratory uses advanced artificial intelligence technology todenoise, super-resolution and feature extraction the images and matching and other processing, thus overcoming many limitations of traditional methods. Then the SfM algorithm and Bundle Adjustment (BA) are used to extract sparse geometric structures and camera parameters from the image. At the same time, the team developed a pose estimation algorithm that supports multi-sensor data input such as panoramic cameras, multi-camera groups, RGBD cameras, lidar, GPS/IMU, etc., to achieve high-precision, multi-modal, adaptive sparse reconstruction. In order to process large-scale data, the team developed block reconstruction and map merging strategies to achieve parallel reconstruction of distributed clusters, significantly improving reconstruction efficiency.
After completing the sparse reconstruction of the scene,convert the two-dimensional image information into three-dimensional point cloud information through the Multiple View Stereo (MVS) technology. The team self-developed depth estimation algorithms based on monocular cameras, binocular cameras and multi-eye stereo vision. It uses neural networks to perform dense depth estimation and achieve stable and excellent performance in arbitrary parallax and various texture environments. After obtaining the point cloud information, perform point cloud denoising and completion, and achieve scene geometric consistency through point cloud registration. Finally, the point cloud fusion strategy based on VoxelHash and image semantic information is used to further filter out noise and generate a smoother and more consistent complete scene point cloud.
After obtaining the scene point cloud, perform Mesh reconstruction. The Volcano Engine Multimedia Laboratory has self-developed a variety of grid optimization algorithms to achieve grid smoothing, denoising, simplification and hole filling to obtain a more refined, complete and high-quality grid model. Thanks to high-precision camera pose estimation and image quality optimization such as image super-resolution during image processing, combined with self-developed mapping algorithms, high-quality texture maps with higher definition and fewer seams can be obtained. At the same time, the texture repacking algorithm is optimized to achieve higher texture utilization, reduce storage resource waste, and improve the effective texture resolution.
Traditional image registration algorithm
Volcano Engine Video Cloud Algorithm
##Traditional Modeling Algorithm
Volcano Engine Video Cloud Algorithm Modeling Results
Urban Scene Modeling
Volcano Engine Video Cloud Algorithm
Suzhou Yuantong Temple reconstruction results##Item reconstruction technology and scene reconstruction technology of Volcano Engine Multimedia Laboratory It can restore cultural relics of different sizes and shapes with equal proportions and high precision. The above-mentioned technology can convert offline cultural relics to online, and realize the virtual presentation of cultural relics in PICO and Douyin. Users can play with oracle bone inscriptions in their hands and clearly see the text on them, achieving a cultural relic viewing experience that is not available in traditional visits. , and at the same time, you can transcend space limitations, be in and roam the Dunhuang Grottoes. In addition, this technology can convert offline precious cultural relics into online permanent digital resources, realize digital protection of cultural relics, and allow future generations to experience the full picture of cultural relics personally.
2.3 Self-developed light field video technology: the problem of balancing cost and accuracyIn order to be able to watch a grand dance immersively in the virtual Dunhuang Grottoes and feel that it transcends reality Experience, Volcano Engine Multimedia Laboratory's self-developed light field video technology can reconstruct dynamic characters and scenes with high fidelity, reaching the industry's advanced level.
Dynamic 3D mesh data (Dynamic Mesh) can represent dynamic characters and scenes, but how to reconstruct a high-quality dynamic 3D mesh and make the newly rendered image as realistic as a photo is a difficult problem.If the scene is manually reconstructed by a 3D scene designer, better reconstruction quality will be obtained, but a greater labor cost will be incurred; if the 3D scene is automatically reconstructed through algorithms such as SFM/MVS, the scene texture needs to be reconstructed. Certain requirements are required, and the reconstruction results may contain imprecise geometric details and texture distortion.
Neural Radiation Field Technology uses neural networks for implicit reconstruction and uses differentiable rendering models to learn how to render images from new perspectives from existing views, thereby achieving photorealism. Image rendering, namely Neural Radiation Field (NeRF) technology. The differentiable rendering model models the rendering process from three-dimensional space models and textures to images. Its differentiable characteristics allow three-dimensional space geometry and textures to be learned through neural networks under the supervision of existing perspective images. Under an unknown new perspective, the learned three-dimensional space geometry can be re-rendered to obtain an image from a new perspective.
The Volcano Engine Multimedia Laboratory combines neural radiation field technology with traditional mesh modeling technology. In specific practice, we first reconstruct the rough geometric outline of the character, improve the NeRF technology, incorporate the geometric outline as a priori and add training guidance, implicitly learn the three-dimensional space geometry, and re-render the image from a dense new perspective. During the neural radiation field training process, for dynamic character scenes, the team used some optimization strategies to improve the new perspective generation effect in this scene, such as using hierarchical expression based on hash coding to improve model training speed, and using streaming training to improve dynamics. Inter-frame consistency of scenes, etc. Finally, video fusion technology is used to automatically learn background information and realize relighting of the foreground, so that foreground actors and background scenes can be seamlessly integrated. At the same time, the light field video technology of the Volcano Engine Multimedia Laboratory can realize NeRF editing, reconstruct and reproduce complex dynamic scenes. The light field video technology of Volcano Engine Multimedia Laboratory only requires sparse multi-camera input. Able to generate dense light field data, this is mainly achieved by using new perspective generation technology based on deep learning. Compared with traditional video data, light field video data has the characteristics of large data volume. The team uses multi-view aggregation coding technology to compress the light field data to reduce the pressure of transmission and storage. Combined with large-scale live broadcast technology and RTC transmission technology, it is possible to realize on-demand and live broadcast of light field videos. As 3D technology continues to mature, the 3D technology of the Volcano Engine Multimedia Laboratory is not only Specific applications have been implemented in the VR field, autonomous driving, video live broadcast, games and other scenarios, and will continue to be explored in the fields of industry, medical care, construction and home furnishing, aerospace and other fields. Volcano Engine hopes to widely apply object reconstruction technology, scene reconstruction technology and light field video technology to products and projects in various industries, serve corporate customers, and bring users a higher-definition, more interactive, and more immersive innovative experience. . The Volcano Engine Multimedia Laboratory is a research team under ByteDance. It is committed to exploring cutting-edge technologies in the multimedia field and participating in international standardization work. Its many innovative algorithms and software and hardware solutions have been widely used in Douyin, Douyin, etc. Multimedia business for Xigua Video and other products, and provides technical services to enterprise-level customers of Volcano Engine. Since the establishment of the laboratory, many papers have been selected into top international conferences and flagship journals, and have won several international technical competition championships, industry innovation awards and best paper awards. 3. Summary and Outlook
The above is the detailed content of Make cultural relics 'alive', the volcano engine video cloud 3D reconstruction technology is revealed. For more information, please follow other related articles on the PHP Chinese website!