Table of Contents

  1. Contribution
  2. Data Annotation
  3. Text2CAD Transformer
  4. Results
  5. Quantitative Results
  6. Video
  7. Acknowledgement
  8. Citation
More Research

Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts

· ·
· ·

* equal contributions · corresponding author
1 German Research Center for AI (DFKI GmbH) · 2 RPTU · 3 MindGarage · 4 BITS Pilani, Hyderabad

NeurIPS 2024 (Spotlight 🤩)

Arxiv Code (Soon) 🤗 Dataset Poster

Text2CAD: Designers can efficiently generate parametric CAD models from text prompts. The prompts can vary from abstract shape descriptions to detailed parametric instructions.

Contribution

We propose Text2CAD as the first AI framework for generating parametric CAD designs using multi-level textual descriptions . Our main contributions are:

  1. A Novel Data Annotation Pipeline that leverages open-source LLMs and VLMs to annotate DeepCAD dataset with text prompts containing varying level of complexities and parametric details.
  2. Text2CAD Transformer: An end-to-end Transformer based autoregressive architecture for generating CAD design history from input text prompts.

Data Annotation

Our data annotation pipeline generates multi-level text prompts describing the construction workflow of a CAD model with varying complexities. We use a two-stage method -

  1. Stage 1: Shape description generation using VLM (LlaVA-NeXT).
  2. Stage 2: Multi-Level textual annotation generation using LLM (Mixtral-50B).
Architecture

Text2CAD Transformer

We developed Text2CAD Transformer to transform natural language descriptions into 3D CAD models by deducing all its intermediate design steps autoregres- sively. Our model takes as input a text prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \({t-1}\). The text embedding \(T_{adapt}\) is extracted from \(T\) using a pretrained BeRT Encoder followed by a trainable Adaptive layer. The resulting embedding \(T_{adapt}\) and the CAD sequence embedding \(F^0_{t-1}\) is passed through \(\mathbf{L}\) decoder blocks to generate the full CAD sequence in auto-regressive way.

Architecture

Visual Results

Quantitative Results

We evaluated the performance of Text2CAD using two strategies.

  1. CAD Sequence Evaluation: We assess the parametric correspondence between the generated CAD sequences with the input texts. This is done using the following metrics:
    • F1 Scores of Line, Arc, Circle and Extrusion using the method proposed in CAD-SIGNet.
    • Chamfer Distance (CD) measures geometric alignment between the ground truth and reconstructed CAD models of Text2CAD and DeepCAD.
    • Invality Ratio (IR) Measures the invalidity of the reconstructed CAD models.
  2. Visual Inspection: We compare the performance of Text2CAD and DeepCAD with GPT-4 and Human evaluation.

Click on the tab to visualize the bar chart. You can also hover on the bars to see the metrics.

Video

Coming Soon

Acknowledgement

This work was in parts supported by the EU Horizon Europe Framework under grant agreement 101135724 (LUMINOUS).

Citation

If you use our dataset, please cite our works.

@Inproceedings{khan2024textcad,
title={Text2CAD: Generating Sequential {CAD} Designs from Beginner-to-Expert Level Text Prompts},
author={Mohammad Sadil Khan and Sankalp Sinha and Sheikh Talha Uddin and Didier Stricker and Sk Aziz Ali and Muhammad Zeshan Afzal},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=5k9XeHIK3L}
}

@Inproceedings{Khan_2024_CVPR,
author = {Khan, Mohammad Sadil and Dupont, Elona and Ali, Sk Aziz and Cherenkova, Kseniya and Kacem, Anis and Aouada, Djamila},
title = {CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {4713-4722}
}