DreamCAD — Scaling Multi-modal CAD Generation

turns text, images, and point clouds into editable parametric CAD surfaces. It's trained on over 1.3M meshes without any CAD annotations. is our companion dataset of 1M+ GPT-5–generated captions, making large-scale text-to-CAD research finally possible.

1.3M+

Training Meshes

Mesh-based training across 10 public datasets

1M+

GPT-5 Captions

Largest CAD captioning dataset to date

>75%

User Preference

Across multimodal generation tasks

15s-30s

Inference Time

Text, Image & point-to-CAD on a single H100

STEP

Output Format

Editable surfaces via control points & weights on CAD software

Core Contributions

What DreamCAD Brings

A unified framework combining differentiable surface representation with large-scale multimodal CAD generation.

Multimodal CAD generation from point-level supervision alone without any CAD annotations. BReps as rational Bézier patches, differentiably tessellated into meshes.

1M+ GPT-5–generated captions across ABC, Automate, CADParser & Fusion360 — the largest CAD captioning dataset for text-to-CAD research.

🏆 State-of-the-Art

SOTA across text, image & point modalities on ABC and Objaverse — up to 70% lower Chamfer Distance and >75% user preference over all baselines.

🔧 Topology Recovery

DreamCAD's high-fidelity geometric outputs can serve as a strong geometric prior for CAD topology recovery.

Representation

Bézier Patch-Based CAD

Each CAD model is represented as a set of bicubic rational Bézier patches defined by learnable control points and weights. Adjacent patches share boundary control points to ensure C⁰ continuity — producing connected, watertight, and directly editable surfaces exportable as STEP files via OpenCascade.

Live Demo Bicubic Bézier Surface — 4×4 Control Points

Given a set of Bézier patches $\{S_k\}_{k=1}^K$, each patch is evaluated on a uniform $r \times r$ grid in the UV domain. A rational Bézier surface of degree $(n, m)$ is defined by control points $\mathbf{C} = \{c_{ij}\}$ and non-negative weights $\mathbf{W} = \{w_{ij}\}$ as: $$S(u,v) = \frac{\sum_{i,j} B_i^n(u)\, B_j^m(v)\, w_{ij}\, c_{ij}}{\sum_{i,j} B_i^n(u)\, B_j^m(v)\, w_{ij}}$$ where $B_i^n(u) = \binom{n}{i} u^i (1-u)^{n-i}$ are Bernstein basis functions and $(u,v) \in [0,1]^2$. For the bicubic case $n = m = 3$. Adjacent grid points define quadrilateral cells split into triangles to form a locally consistent mesh. Since $S(u,v)$ is differentiable with respect to both $\mathbf{C}$ and $\mathbf{W}$, the entire tessellation supports end-to-end gradient-based optimization via Chamfer Distance loss.

Architecture

How DreamCAD Works

DreamCAD adopts a multi-stage pipeline. Sparse voxel representations from input meshes are first encoded into structured latents, then decoded into C⁰-continuous Bézier patches.

The VAE encodes each 3D mesh into structured latents by voxelizing to 32³ resolution and augmenting each active voxel with DINOv2 embeddings from 150 RGB and normal renders, SDF values, and voxel centers. A sparse Transformer encoder produces structured latents $(v_i, z_i)$, which are decoded into bicubic rational Bézier patches. The surface is optimized end-to-end via Chamfer, G1, and Laplacian losses: $$\mathcal{L} = \lambda_{\text{cd}}\,\texttt{CD}(\mathcal{X}_g, \mathcal{X}_d) + \lambda_{g1}\,\texttt{G1}(\mathcal{S}_d) + \lambda_{\text{lp}}\,\texttt{Laplacian}(\mathcal{M}_d) + \lambda_{\text{kl}}\,D_{\text{KL}}$$

C⁰ Continuity via Flood Fill

Directly predicting Bézier control points from latents leads to disconnected or overlapping patches. To enforce C⁰ continuity structurally, DreamCAD initializes patches from sparse voxels using a flood-fill algorithm that removes internal quads, leaving only surface-facing quads. Each surface quad is converted into a bicubic Bézier patch by sampling a $4 \times 4$ control-point grid via bilinear interpolation. Adjacent patches share boundary control points by construction — guaranteeing seamless, gap-free surfaces before any decoder refinement.

DreamCAD supports text, image, and point cloud inputs through a two-stage flow-matching framework. The first stage generates a coarse voxel grid from the input condition via a lightweight voxel flow model. The second stage predicts fine-grained SLAT features per active voxel, which the pretrained parametric decoder transforms into the final Bézier surface. For text-to-CAD, a LoRA-finetuned Stable Diffusion 3.5 bridges text to the image-to-CAD model, completing the multimodal pipeline in ~30s.

Dataset

CADCap-1M Annotation Pipeline

CADCap-1M is built on top of four large-scale CAD repositories — ABC, Automate, CADParser, and Fusion360 — spanning over 1M parametric CAD models across mechanical, industrial, and everyday object categories. For each model, four orthographic views are rendered in Blender and passed to GPT-5 alongside structured metadata extracted directly from the CAD files — including model names, hole counts, and relative dimensions. This metadata-augmented prompting grounds the language model in geometric reality, reducing hallucinations and producing precise, structure-aware captions such as "M3×8 bolt … cylindrical shank … central hex socket. Height is 1.9× width." The result is the largest CAD captioning dataset to date, enabling large-scale text-to-CAD research for the first time.

Caption quality is assessed via GPT-5 evaluation on 5K samples and user studies on 1K samples. Evaluators are shown four rendered views, metadata, and the caption, then rate both geometric and semantic accuracy. Overall, 95.8% (user) and 98.31% (GPT-5) of captions are judged correct — validating the reliability of metadata-augmented prompting.

Gallery

Explore DreamCAD's Gallery

Explore DreamCAD's multimodal CAD reconstructions across text, image, and point cloud inputs — alongside CADCap-1M's high-quality annotations spanning industrial parts.

1 / 2

1 / 4

Evaluation

Quantitative Results

Comprehensive benchmarks across point-, image- and text-to-CAD tasks on ABC and Objaverse — demonstrating DreamCAD's significant improvements in geometric fidelity and visual alignment. F1 and IR are scaled by 10², while CD, JSD, and MMD are scaled by 10³.

Model	F1 ↑	NC ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
DeepCAD	19.31	0.49	51.10	0.37	783.94	29.63	11.01
CAD-Recode	75.99	0.79	3.73	0.13	271.89	2.94	15.39
Cadrille	78.86	0.80	2.98	0.12	236.10	2.51	5.84
DreamCAD (Ours)	92.12	0.94	0.93	0.06	96.13	0.84	0.00

Model	GPT ↑	User ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
Cadrille	7.75	5.34	111.50	0.49	909.92	68.97	14.19
BRepDiff	16.13	17.63	20.69	0.28	662.97	13.89	0.11
DreamCAD (Ours)	76.12	77.03	4.12	0.17	412.31	6.31	0.00

Model	GPT ↑	User ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
DeepCAD	0.49	0.40	86.54	0.44	887.69	37.66	39.02
Text2CAD	1.11	2.46	82.22	0.41	852.68	40.65	6.43
Cadrille	0.91	0.34	155.80	0.53	957.51	96.89	4.44
Text2CQ (Qwen3B)	0.01	0.00	68.15	0.39	829.72	37.50	12.52
Text2CQ (GPT2L)	0.02	0.00	71.27	0.39	838.54	40.35	34.05
Text2CQ (CodeGPT)	0.94	1.00	77.91	0.41	850.23	43.71	32.56
CADFusion	2.35	2.76	56.36	0.31	789.12	27.54	9.21
NURBGen	4.21	4.44	50.84	0.32	800.46	29.53	9.12
BRepDiff	4.34	3.20	54.12	0.38	812.31	34.72	1.12
DreamCAD (Ours)	85.62	85.40	20.32	0.14	734.92	19.43	0.00

Model	F1 ↑	NC ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
DeepCAD	7.05	0.48	320.33	0.41	855.14	34.62	10.88
CAD-Recode	53.24	0.66	7.92	0.19	479.50	6.27	15.53
Cadrille	57.49	0.67	6.28	0.17	445.23	5.24	10.81
DreamCAD (Ours)	87.31	0.89	1.25	0.11	189.12	1.86	0.00

Model	GPT ↑	User ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
Cadrille	1.10	0.45	99.91	0.48	913.24	58.02	27.16
BRepDiff	18.12	16.63	57.51	0.43	875.68	27.83	0.96
DreamCAD (Ours)	80.78	82.92	20.16	0.27	541.81	13.41	0.006

Model	GPT ↑	User ↑	CD ↓	HD ↓	JSD ↓	MMD ↓	IR ↓
DeepCAD	0.00	0.00	80.68	0.45	903.40	37.34	41.90
Text2CAD	0.22	0.36	93.96	0.47	896.22	44.79	9.37
Cadrille	0.04	0.00	162.16	0.55	961.53	76.39	6.55
Text2CQ (Qwen3B)	0.00	0.00	83.43	0.44	890.89	47.62	14.40
Text2CQ (GPT2L)	0.00	0.00	84.85	0.47	891.96	53.70	64.25
Text2CQ (CodeGPT)	0.24	0.12	86.75	0.46	879.57	58.48	25.14
CADFusion	2.67	2.35	81.03	0.43	853.43	58.11	12.40
NURBGen	8.18	7.28	73.54	0.41	839.08	38.69	2.67
BRepDiff	6.33	6.41	74.32	0.38	808.19	41.13	2.31
DreamCAD (Ours)	82.32	83.48	34.61	0.28	698.21	28.14	0.00

Qualitative

Visual Comparison

Qualitative comparison of DreamCAD against baselines across point cloud, image, and text-conditioned CAD generation.

1 / 2

Application

CAD Topology Recovery

While DreamCAD's outputs are editable via control points and weights, they lack complete CAD topology for production-level readiness. However, DreamCAD's high-fidelity geometric reconstruction can serve as a strong prior for topology recovery, which forms the basis of our future research.

50K

Training Samples

Qwen3-4B finetuned with LoRA for NURBS prediction

99.2%

Valid BReps

Topology recovery success rate on test set

0.17

Chamfer Distance (×10³)

High geometric fidelity after topology recovery

STEP

Output Format

Industry-standard, editable in any CAD software

Research

Future Work

Full CAD topology recovery using DreamCAD's reconstruction as geometric prior.

Each Bézier patch from DreamCAD's output is represented as 16 control points with corresponding weights, encoded via Transformer Encoder. We finetune Qwen3-4B on 50K samples to convert these patch-based representations into structured NURBS sequences with full semantic topology following the formulation of NURBGen . The resulting output is a valid BRep exportable as a standard STEP file.

Given DreamCAD's patch output, Qwen3-4B predicts a structured NURBS representation with knot vectors, degrees, poles, and weights — producing a complete, semantically valid BRep topology ready for downstream CAD workflows.

Citation

BibTeX

@article{dreamcad,
  title   = {DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces},
  author  = {Mohammad Sadil Khan, Muhammad Usama, Rolandos Alexandros Potamias, Didier Stricker, Muhammad Zeshan Afzal, Jiankang Deng, Ismail Elezi},
  journal = {Arxiv},
  year    = {2026}
}

Contact

Get in Touch

Questions about DreamCAD, collaboration opportunities, or just want to say hi? Fill out the form and we'll get back to you.

Scaling Multi-modal CAD Generationusing Differentiable Parametric Surfaces