1. Contribution
  2. Data Annotation
  3. Text2CAD Transformer
  4. Visual Results
  5. Quantitative Results
More Research

Text2CAD
Generating Sequential CAD Designs from
Beginner-to-Expert Level Text Prompts

Mohammad Sadil Khan 1*†
Sankalp Sinha 1*
Talha Uddin Sheikh 1
Didier Stricker 1

Sk Aziz Ali 2
Muhammad Zeshan Afzal 1

1 DFKI 1,2 RPTU 1 MindGarage 2 BITS Pilani, Hyderabad

NeurIPS 2024 (Spotlight 🤩)

Paper Arxiv Code 🤗 Dataset Poster

Text2CAD: Designers can efficiently generate parametric CAD models from text prompts. The prompts can vary from abstract shape descriptions to detailed parametric instructions.

Contribution

Our proposed Text2CAD is the first AI framework for generating parametric CAD designs using multi-level textual descriptions . Our main contributions are:

  1. A Novel Data Annotation Pipeline that leverages open-source LLMs and VLMs to annotate DeepCAD dataset with text prompts containing varying level of complexities and parametric details.
  2. Text2CAD Transformer: An end-to-end Transformer based autoregressive architecture for generating CAD design history from input text prompts.

Data Annotation

Our data annotation pipeline generates multi-level text prompts describing the construction workflow of a CAD model with varying complexities. We use a two-stage method -

  1. Stage 1: Shape description generation using VLM (LlaVA-NeXT).
  2. Stage 2: Multi-Level textual annotation generation using LLM (Mixtral-50B).
Architecture

Text2CAD Transformer

Text2CAD Transformer converts natural language descriptions into parametric 3D CAD models by deducing all its intermediate design steps autoregres- sively. Our model takes as input a text prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \({t-1}\). The text embedding \(T_{adapt}\) is extracted from \(T\) using a pretrained BeRT Encoder followed by a trainable Adaptive layer. The resulting embedding \(T_{adapt}\) and the CAD sequence embedding \(F^0_{t-1}\) is passed through \(\mathbf{L}\) decoder blocks to generate the full CAD sequence in auto-regressive way.

Architecture

Visual Results

Quantitative Results

We evaluated the performance of Text2CAD using two strategies.

  1. CAD Sequence Evaluation: We assess the parametric correspondence between the generated CAD sequences with the input texts. This is done using the following metrics:
    • F1 Scores of Line, Arc, Circle and Extrusion using the method proposed in CAD-SIGNet.
    • Chamfer Distance (CD) measures geometric alignment between the ground truth and reconstructed CAD models of Text2CAD and DeepCAD.
    • Invality Ratio (IR) Measures the invalidity of the reconstructed CAD models.
  2. Visual Inspection: We compare the performance of Text2CAD and DeepCAD with GPT-4 and Human evaluation.

Click on the tab to visualize the bar chart. You can also hover on the bars to see the metrics.

Acknowledgement

This work was in parts supported by the EU Horizon Europe Framework under grant agreement 101135724 (LUMINOUS).

Citation

If you use our dataset, please cite our works.

@Inproceedings{khan2024textcad,
title={Text2CAD: Generating Sequential {CAD} Designs from Beginner-to-Expert Level Text Prompts},
author={Mohammad Sadil Khan and Sankalp Sinha and Sheikh Talha Uddin and Didier Stricker and Sk Aziz Ali and Muhammad Zeshan Afzal},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=5k9XeHIK3L}
}

@Inproceedings{Khan_2024_CVPR,
author = {Khan, Mohammad Sadil and Dupont, Elona and Ali, Sk Aziz and Cherenkova, Kseniya and Kacem, Anis and Aouada, Djamila},
title = {CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {4713-4722}
}