Exploring the Capabilities of NeRF in Generating 3D Models

Shun Fang

doi:10.4108/airo.5360

Authors

Shun Fang Peking University

DOI:

https://doi.org/10.4108/airo.5360

Keywords:

NeRF, GANs, MLP, EG3D, DreamFusion, Magic3D

Abstract

This review paper presents a comprehensive analysis of three cutting-edge techniques in 3D content synthesis: EG3D, DreamFusion, and Magic3D. EG3D, leveraging geometry-aware representations and generative adversarial networks, enables the generation of high-quality 3D shapes. DreamFusion integrates text-to-image diffusion models with neural rendering, opening new horizons for creative expression. Magic3D, on the other hand, extends text-to-image synthesis principles to 3D content creation, synthesizing realistic and detailed models. We delve into the theoretical frameworks, neural network architectures, and loss functions of these techniques, analyzing their experimental results and discussing their strengths, weaknesses, and potential applications. This review serves as a valuable resource for researchers and practitioners, offering insights into the latest advancements and pointing towards future directions for exploration in 3D content synthesis.

Downloads

References

Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Zhang, Q., ... & Liu, M. Y. (2022). ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324.

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., ... & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741.

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E. L., ... & Norouzi, M. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35, 36479-36494.

Chan, E. R., Lin, C. Z., Chan, M. A., Nagano, K., Pan, B., De Mello, S., ... & Wetzstein, G. (2022). Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16123-16133).

Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988.

Lin, C. H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., ... & Lin, T. Y. (2023). Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 300-309).

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256-2265). PMLR.

Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.

Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., ... & Norouzi, M. (2022, July). Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings (pp. 1-10).

Yu, J., Xu, Y., Koh, J. Y., Luong, T., Baid, G., Wang, Z., ... & Wu, Y. (2022). Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2(3), 5.

Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D. J., & Norouzi, M. (2022). Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4), 4713-4726.

Gadelha, M., Maji, S., & Wang, R. (2017, October). 3d shape induction from 2d views of multiple objects. In 2017 international conference on 3d vision (3DV) (pp. 402-411). IEEE.

Henzler, P., Mitra, N. J., & Ritschel, T. (2019). Escaping plato's cave: 3d shape from adversarial rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9984-9993).

Lunz, S., Li, Y., Fitzgibbon, A., & Kushman, N. (2020). Inverse graphics gan: Learning to generate 3d shapes from unstructured 2d data. arXiv preprint arXiv:2002.12674.

Smith, E. J., & Meger, D. (2017, October). Improved adversarial systems for 3d object generation and reconstruction. In Conference on Robot Learning (pp. 87-96). PMLR.

Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29.

Achlioptas, P., Diamanti, O., Mitliagkas, I., & Guibas, L. (2018, July). Learning representations and generative models for 3d point clouds. In International conference on machine learning (pp. 40-49). PMLR.

Luo, S., & Hu, W. (2021). Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2837-2845).

Mo, K., Guerrero, P., Yi, L., Su, H., Wonka, P., Mitra, N., & Guibas, L. J. (2019). Structurenet: Hierarchical graph networks for 3d shape generation. arXiv preprint arXiv:1908.00575.

Yang, G., Huang, X., Hao, Z., Liu, M. Y., Belongie, S., & Hariharan, B. (2019). Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4541-4550).

Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., & Kreis, K. (2022). Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems, 35, 10021-10039.

Zhou, L., Du, Y., & Wu, J. (2021). 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5826-5835).

Zhang, Y., Chen, W., Ling, H., Gao, J., Zhang, Y., Torralba, A., & Fidler, S. (2020). Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. arXiv preprint arXiv:2010.09125.

Gao, J., Shen, T., Wang, Z., Chen, W., Yin, K., Li, D., ... & Fidler, S. (2022). Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35, 31841-31854.

Chen, Z., & Zhang, H. (2019). Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5939-5948).

Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A. (2019). Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4460-4470).

Ibing, M., Kobsik, G., & Kobbelt, L. (2023). Octree transformer: Autoregressive 3d shape generation on hierarchically structured sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2697-2706).

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99-106.

Xu, G., Khan, A. S., Moshayedi, A. J., Zhang, X., & Shuxin, Y. (2022). The object detection, perspective and obstacles in robotic: a review. EAI Endorsed Transactions on AI and Robotics, 1(1). DOI: 10.4108/airo.v1i1.2709.

Moshayedi, A. J., Sambo, S. K., & Kolahdooz, A. Design and development of cost-effective exergames for activity incrementation. In2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) 2022 Jan 14 (pp. 133-137). DOI: 10.1109/ICCECE54139.2022.9712844.

Moshayedi, A. J., Roy, A. S., Taravet, A., Liao, L., Wu, J., & Gheisari, M. (2023). A secure traffic police remote sensing approach via a deep learning-based low-altitude vehicle speed detector through uavs in smart cites: Algorithm, implementation and evaluation. Future transportation, 3(1), 189-209. DOI: 10.3390/futuretransp3010012.

Moshayedi, A. J., Khan, A. S., Yang, S., & Zanjani, S. M. (2022, April). Personal image classifier based handy pipe defect recognizer (hpd): Design and test. In 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP) (pp. 1721-1728). IEEE. DOI: 10.1109/ICSP54964.2022.9778676.

Niemeyer, M., & Geiger, A. (2021). Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11453-11464).

Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.

Shi, Y., Aggarwal, D., & Jain, A. K. (2021). Lifting 2d stylegan for 3d-aware face generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6258-6266).

Jain, A., Mildenhall, B., Barron, J. T., Abbeel, P., & Poole, B. (2022). Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 867-876).

Mohammad Khalid, N., Xie, T., Belilovsky, E., & Popa, T. (2022, November). Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers (pp. 1-8).