Ai August 24, 2024

Introducing TripoSR: Fast 3D Object Generation from Single Images — Stability AI


post-thumb

Introducing TripoSR: Fast 3D Object Generation from Single Images — Stability AI

img]

Comparing TripoSR 3D reconstructions with those from OpenLRM.

Technical Details

Our training data preparation incorporates diverse data rendering techniques that more closely replicate the distribution of images found in the real world, significantly improving the model’s ability to generalize. We carefully curated a CC-BY, a higher-quality subset of the Objaverse dataset, for the training data. On the model side, we also introduced several technical improvements over the base LRM model, including channel number optimization, mask supervision, and a more efficient crop rendering strategy. You can read the technical report for more details.

We invite developers, designers, and creators to explore its capabilities, contribute to its evolution, and discover its potential to transform their work and industries.

The code for the TripoSR model is now available on Tripo AI’s GitHub, and the model weights are available on Hugging Face. Please refer to our technical report for more details on the TripoSR model.

To stay updated on our progress, follow us on Twitter, Instagram, LinkedIn, and join our Discord Community.

img]

Stable Diffusion creator Stability AI and AI 3D modeling start-up Tripo AI have released TripoSR, a new open-source AI model for generating 3D models from single source images.

It can produce a “high-quality 3D model … in under a second”, and is “designed to cater to the growing demands of entertainment, gaming, industrial design, and architecture professionals”.

What does TripoSR do?

TripoSR generates a textured 3D mesh from a single source image, automatically generating the geometry and textures not visible in the original view.

The model was trained on a subset of the publicly available Objaverse research data set, using 3D models available under a Creative Commons CC BY license.

How does TripoSR compare to other image-to-3D reconstruction AI models?

In the technical report accompanying the release, Stability AI and Tripo AI compare TripoSR to other open-source 3D reconstruction models like OpenLRM and One-2-3-45.

It outperformed the other models tested for the accuracy of the models generated, and does pretty well for processing time.

So what does that mean in practice for 3D artists?

The results look pretty good in Tripo AI’s demo (at the top of this story), although the simpler models in the video from Stability AI’s blog post (above), may be more typical of the output.

Early user tests, such as this video from GamesFromScratch, suggest that the results for hard surface objects could be usable as background models for games, or for AR applications, although the results for organic characters are… interesting.

License and system requirements

The source code for TripoSR is available on GitHub under a MIT license, along with a list of dependencies. The model weights are available on Hugging Face.

At default settings, the model requires about 6GB of VRAM for a single image input, but according to Stability AI, it can be run on machines without GPUs.

Read an overview of open-source 3D generation model TripoSR in Stability AI’s blog post

img]

Reddit Vote Flip Share 0 Shares

In the realm of 3D generative AI, the boundaries between 3D generation and 3D reconstruction from a small number of views have started to blur. This convergence is propelled by a series of breakthroughs, including the emergence of large-scale public 3D datasets and advancements in generative model topologies

There has been new research into using 2D diffusion models to generate 3D objects from input photos or text prompts to circumvent the lack of 3D training data. One example is DreamFusion, which pioneered score distillation sampling (SDS) by optimizing 3D models using a 2D diffusion model. To generate detailed 3D objects, this method is a game-changer since it uses 2D priors for 3D production. However, because of the high computational and optimization requirements and the difficulty in accurately managing the output models, these methods usually encounter limits with slow generation speed. Feedforward 3D reconstruction models are far more efficient in terms of computing power. Several newer methods in this vein have demonstrated the potential for scalable training on varied 3D datasets. These new methods significantly improve the efficiency and practicality of 3D models by allowing for quick feedforward inference and, maybe, by giving better control over the produced outputs.

A new study by Stability AI and Tripo AI presents the TripoSR model, which can generate 3D feedforward models from a single image in under half a second using an A100 GPU. The team provides various enhancements to data curation and rendering, model design, and training methodologies, all while expanding upon the LRM architecture. For 3D reconstruction from a single image, TripoSR uses the transformer architecture, much like LRM. It takes an object in a single RGB photograph and produces a three-dimensional model.

The TripoSR model comprises three main parts:

An image encoder

A neural radiance field (NeRF) based on triplanes

An image-to-triplane decoder

The image encoder is initialized using a pre-trained vision transformer model called DINOv1. This model plays a crucial role in the TripoSR model. It converts an RGB image into a series of latent vectors, which encode the global and local picture properties necessary for reconstructing the 3D object.

The proposed approach avoids explicit parameter conditioning to build a more durable and flexible model that can handle various real-world circumstances without relying on accurate camera data. Important design factors include transformer layer count, triplane size, NeRF model details, and primary training settings.

Two enhancements to the training data collecting have been implemented in response to the paramount significance of data:

Data curation: Data curation, which involved picking a subset of the Objaverse dataset distributed under the CC-BY license, improved the quality of training data.

Data Rendering: They have implemented various data rendering strategies to improve the model’s generalizability, even when trained solely with the Objaverse dataset. These techniques better mimic the distribution of real-world photos.

The experiments have demonstrated that the TripoSR model outperforms competing open-source solutions numerically and qualitatively. This, along with the availability of the pretrained model, an online interactive demo, and the source code under the MIT license, presents a significant advancement in the fields of artificial intelligence (AI), computer vision (CV), and computer graphics (CG). The team anticipates a transformative impact on these fields by equipping researchers, developers, and artists with these cutting-edge tools for 3D generative AI.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Want to get in front of 1.5 Million AI enthusiasts? Work with us here


回到上一頁