3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows

Vivian Liu, Jo Vermeulen, George Fitzmaurice, Justin Matejka

January 2023 · Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS)

Abstract

Text-to-image AI are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration have not yet been explored. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that generates 2D image inspiration for 3D design. 3DALL-E allows users to construct text and image prompts based on what they are modeling. In a study with 13 designers, we found that designers saw great potential in 3DALL-E within their workflows and could use text-to-image AI to produce reference images, prevent design fixation, and inspire design considerations. We elaborate on prompting patterns observed across 3D modeling tasks and provide measures of prompt complexity observed across participants. From our findings, we discuss how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.

DOI PDF

Figures

Figure 1: 3DALL-E integrates a state-of-the-art text-to-image AI (DALL-E) into the 3D CAD software Fusion 360. This plugingenerates 2D image inspiration for conceptual CAD and product design workflows

Figure 2: 3DALL-E walkthrough. Step I: Initial state, where users can type their design intentions. Step II: Users are presentedwith prompt suggestions from GPT-3. Step III: Selected suggestions are r

Figure 3: System design showing the architectures involved in 3DALL-E, which incorporates three large AI models into theworkbench of an industry standard CAD software. In the top left panel, we show h

Figure 4: Diagram showing how text highlights were calculated using CLIP with image and text from the prompt suggestions asinput. The CLIP logits score was set as the opacity of each prompt suggestion

Figure 5: Examples of 3D designs participants brought in during 𝑇𝑒𝑑𝑖𝑡, which was to edit an existing model.

Figure 6: Example of 3D designs participants came up with during 𝑇𝑐𝑟𝑒𝑎𝑡𝑒, which was to create a model from scratch.

Figure 7: Count of prompt keywords by source (3DALL-E- or participant-provided) for each participant during T.g;; (top) and T-;eate (bottom). 3DALL-E provides at least half of prompt keywords for 9/13 participants in both tasks.

Figure 8: Distribution of Likert scale responses on NASA-TLX, creativity support index, and workflow-specific questions acrossall participants for both 𝑇𝑒𝑑𝑖𝑡and 𝑇𝑐𝑟𝑒𝑎𝑡𝑒. Full questions are in the Appe

Figure 9: Prompting and 3D modeling workflows of design process of three participants (P18, P13, and P1). P18 created a car,P13 created an audio speaker, and P1 edited a robot. Timelines are vertical

Figure 10: Pattern of generation activity for 𝑇𝑒𝑑𝑖𝑡, when participants edited an existing model.

Figure 11: Pattern of generation activity for Tcreare, when participants created a model from scratch.

Figure 12: Prompt complexity measured across participants, where complexity is the count of concepts in each text-only and image+text prompt. Participants span the X-axis, sorted by the count of their most complex prompt. The values are jittered to show multiplicity; many prompts mapped to the same number of concepts. Complexity tended to concentrate between two to six concepts, as seen by the density of prompts within that interval. Each datapoint was colored based on prompt task.

Figure 13: Three DALL-E generations participants (P18, P15,P9) found inspirational from the prompts: “The Dark KnightRises: the body of a car as a Lego building set top view”, “3Drender of a desk lamp

Figure 14: Prompt bibliographies, a design concept we propose for tracking human-AI design history. As prompts become apart of creative workflows, they may be integrated into the design histories alre

BibTeX

@inproceedings{10.1145/3563657.3596098,
  author = {Liu, Vivian and Vermeulen, Jo and Fitzmaurice, George and Matejka, Justin},
  title = {3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows},
  year = {2023},
  isbn = {9781450398930},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3563657.3596098},
  doi = {10.1145/3563657.3596098},
  abstract = {Text-to-image AI are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration have not yet been explored. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that generates 2D image inspiration for 3D design. 3DALL-E allows users to construct text and image prompts based on what they are modeling. In a study with 13 designers, we found that designers saw great potential in 3DALL-E within their workflows and could use text-to-image AI to produce reference images, prevent design fixation, and inspire design considerations. We elaborate on prompting patterns observed across 3D modeling tasks and provide measures of prompt complexity observed across participants. From our findings, we discuss how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.},
  booktitle = {Proceedings of the 2023 ACM Designing Interactive Systems Conference},
  pages = {1955–1977},
  numpages = {23},
  keywords = {3D design, 3D modeling, AI applications, CAD, CLIP, DALL-E, GPT-3, co-creativity, creative copilot, creativity support tools, diffusion, ideation, multimodal, prompt engineering, text-to-3D, text-to-image, workflow},
  location = {Pittsburgh, PA, USA},
  series = {DIS '23},
}