k j

3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows

Vivian Liu, Jo Vermeulen, George Fitzmaurice, Justin Matejka
January 2023 · Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS)

Abstract

Text-to-image AI are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration have not yet been explored. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that generates 2D image inspiration for 3D design. 3DALL-E allows users to construct text and image prompts based on what they are modeling. In a study with 13 designers, we found that designers saw great potential in 3DALL-E within their workflows and could use text-to-image AI to produce reference images, prevent design fixation, and inspire design considerations. We elaborate on prompting patterns observed across 3D modeling tasks and provide measures of prompt complexity observed across participants. From our findings, we discuss how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.

Figures

Figure 1: 3DALL-E integrates a state-of-the-art text-to-image AI (DALL-E) into the 3D CAD software Fusion 360. This plugingenerates 2D image inspiration for conceptual CAD and product design workflows
Figure 2: 3DALL-E walkthrough. Step I: Initial state, where users can type their design intentions. Step II: Users are presentedwith prompt suggestions from GPT-3. Step III: Selected suggestions are r
Figure 3: System design showing the architectures involved in 3DALL-E, which incorporates three large AI models into theworkbench of an industry standard CAD software. In the top left panel, we show h
Figure 4: Diagram showing how text highlights were calculated using CLIP with image and text from the prompt suggestions asinput. The CLIP logits score was set as the opacity of each prompt suggestion
Figure 5: Examples of 3D designs participants brought in during 𝑇𝑒𝑑𝑖𝑑, which was to edit an existing model.
Figure 6: Example of 3D designs participants came up with during π‘‡π‘π‘Ÿπ‘’π‘Žπ‘‘π‘’, which was to create a model from scratch.
Figure 7: Count of prompt keywords by source (3DALL-E- or participant-provided) for each participant during T.g;; (top) and T-;eate (bottom). 3DALL-E provides at least half of prompt keywords for 9/13 participants in both tasks.
Figure 8: Distribution of Likert scale responses on NASA-TLX, creativity support index, and workflow-specific questions acrossall participants for both 𝑇𝑒𝑑𝑖𝑑and π‘‡π‘π‘Ÿπ‘’π‘Žπ‘‘π‘’. Full questions are in the Appe
Figure 9: Prompting and 3D modeling workflows of design process of three participants (P18, P13, and P1). P18 created a car,P13 created an audio speaker, and P1 edited a robot. Timelines are vertical
Figure 10: Pattern of generation activity for 𝑇𝑒𝑑𝑖𝑑, when participants edited an existing model.
Figure 11: Pattern of generation activity for Tcreare, when participants created a model from scratch.
Figure 12: Prompt complexity measured across participants, where complexity is the count of concepts in each text-only and image+text prompt. Participants span the X-axis, sorted by the count of their most complex prompt. The values are jittered to show multiplicity; many prompts mapped to the same number of concepts. Complexity tended to concentrate between two to six concepts, as seen by the density of prompts within that interval. Each datapoint was colored based on prompt task.
Figure 13: Three DALL-E generations participants (P18, P15,P9) found inspirational from the prompts: β€œThe Dark KnightRises: the body of a car as a Lego building set top view”, β€œ3Drender of a desk lamp
Figure 14: Prompt bibliographies, a design concept we propose for tracking human-AI design history. As prompts become apart of creative workflows, they may be integrated into the design histories alre

BibTeX

@inproceedings{10.1145/3563657.3596098,
  author = {Liu, Vivian and Vermeulen, Jo and Fitzmaurice, George and Matejka, Justin},
  title = {3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows},
  year = {2023},
  isbn = {9781450398930},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3563657.3596098},
  doi = {10.1145/3563657.3596098},
  abstract = {Text-to-image AI are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration have not yet been explored. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that generates 2D image inspiration for 3D design. 3DALL-E allows users to construct text and image prompts based on what they are modeling. In a study with 13 designers, we found that designers saw great potential in 3DALL-E within their workflows and could use text-to-image AI to produce reference images, prevent design fixation, and inspire design considerations. We elaborate on prompting patterns observed across 3D modeling tasks and provide measures of prompt complexity observed across participants. From our findings, we discuss how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.},
  booktitle = {Proceedings of the 2023 ACM Designing Interactive Systems Conference},
  pages = {1955–1977},
  numpages = {23},
  keywords = {3D design, 3D modeling, AI applications, CAD, CLIP, DALL-E, GPT-3, co-creativity, creative copilot, creativity support tools, diffusion, ideation, multimodal, prompt engineering, text-to-3D, text-to-image, workflow},
  location = {Pittsburgh, PA, USA},
  series = {DIS '23},
}