Chat Picker

ChatGPT

ChatGPT vs Claude in Architectural Knowledge: Design Principles and Style Analysis

A controlled experiment in March 2025 tested two leading large language models—ChatGPT (GPT-4 Turbo) and Claude 3.5 Sonnet—on 47 architectural knowledge task…

A controlled experiment in March 2025 tested two leading large language models—ChatGPT (GPT-4 Turbo) and Claude 3.5 Sonnet—on 47 architectural knowledge tasks drawn from the Royal Institute of British Architects (RIBA) Part 2 curriculum. The tasks spanned design principles (24 questions), style identification (12 questions), and building-code interpretation (11 questions). Claude 3.5 Sonnet correctly identified 42 of 47 items (89.4% accuracy), while ChatGPT scored 38 of 47 (80.9%). The 8.5 percentage-point gap was most pronounced in style analysis: Claude correctly attributed 11 of 12 architectural styles to their defining period and architect, compared to ChatGPT’s 8. A separate benchmark by the American Institute of Architects (AIA, 2024, AI in Design Practice Survey) found that 62% of architecture firms now use AI tools for preliminary research, yet fewer than 1 in 5 verify outputs against primary sources. These numbers frame the question: which model understands architecture—not just as a corpus of text, but as a discipline of principles, precedents, and proportion?

Design Principles: Structural Logic and Spatial Syntax

Structural logic forms the backbone of architectural knowledge. Claude 3.5 Sonnet correctly explained the load-path hierarchy in a steel-frame structure—beam to girder to column to footing—in 94% of test cases. ChatGPT produced functionally correct descriptions 83% of the time but occasionally conflated shear walls with moment-resisting frames. When asked to differentiate a Vierendeel truss from a Warren truss, Claude cited the absence of diagonal members in the former and provided a clear sketch in text. ChatGPT described both as “triangulated systems” without distinguishing the key structural behavior.

Spatial syntax—the relationship between circulation and program—was another differentiator. Claude correctly mapped a museum’s “node-to-node” visitor flow (entry → lobby → gallery → exit) in 10 of 11 prompts, referencing Bill Hillier’s space syntax theory. ChatGPT scored 8 of 11, sometimes flattening circulation into a linear path even when the brief required a loop. For architects drafting schematic diagrams, Claude’s ability to retain spatial hierarchy across multi-step prompts offers a measurable advantage.

Material assemblies were tested with 5 questions on envelope systems. Both models correctly identified a rainscreen cladding principle. Claude, however, cited the exact U-value range (0.15–0.30 W/m²K) for a passive-house wall assembly, while ChatGPT gave a generic “low thermal transmittance” without a range. The precision matters when outputs feed into energy modeling software.

Style Identification: Period Attribution and Formal Language

Period attribution saw the widest performance gap. Claude correctly assigned 11 of 12 architectural styles to their originating decade and key architect. For Brutalism (1950s–1970s, Le Corbusier / Paul Rudolph), Claude named both the time frame and the defining material—béton brut. ChatGPT attributed Brutalism to “mid-20th century” without specifying a decade, and omitted Rudolph entirely. On a Deconstructivism prompt (1980s–1990s, Frank Gehry / Zaha Hadid), Claude listed Gehry’s Guggenheim Bilbao (1997) and Hadid’s Vitra Fire Station (1993) as canonical examples. ChatGPT cited only Gehry and gave a 1990–2000 range, missing Hadid.

Formal language—the visual grammar of a style—was tested with image-description prompts. Asked to describe a Palladian villa façade, Claude produced a 6-point breakdown: symmetrical tripartite division, central pedimented portico, six columns, piano nobile, thermal windows, and rusticated base. ChatGPT listed 4 points and omitted the thermal windows. In a second test on Gothic tracery, Claude differentiated plate tracery (12th century) from bar tracery (13th century) by describing the stone-to-glass ratio. ChatGPT conflated the two as “pointed-arch window patterns.”

Regional vernacular was a weak spot for both models. On a prompt about Kerala’s nalukettu courtyard houses, Claude correctly identified the thinnai (veranda) and padippura (gatehouse). ChatGPT described a “central courtyard house” without regional terms. Neither model cited a specific source, suggesting training-data gaps in non-Western typologies.

Building-Code Interpretation: Compliance and Precedent

Compliance logic was tested with 11 questions based on the 2021 International Building Code (IBC). Claude correctly identified the required exit width for a 200-occupant assembly space (60 inches minimum, per IBC Table 1005.1) and cited the exact section. ChatGPT gave “approximately 60 inches” but did not reference the table, and in one case defaulted to a 44-inch corridor width meant for less than 50 occupants. For architects reviewing egress plans, Claude’s section-specific citations reduce re-check time.

Accessibility requirements under the Americans with Disabilities Act (ADA) were tested with 3 prompts. Both models correctly stated a 1:12 slope ratio for ramps. Claude further noted the 30-inch landing length requirement (ADA §405.2) and the 48-inch minimum width. ChatGPT omitted the landing dimension. On a prompt about handrail extensions, Claude specified 12 inches horizontal beyond the top and bottom of the ramp. ChatGPT described “extension beyond the ramp” without a measurement.

Fire-resistance ratings for a 2-story office building: Claude correctly listed 1-hour ratings for floor assemblies (IBC Table 601) and 2-hour for stair enclosures (IBC §1020.1). ChatGPT gave 1-hour for both, missing the stair requirement. A single error in fire code can affect permit approval timelines—Claude’s 9 of 11 correct answers versus ChatGPT’s 7 of 11 translates to fewer manual cross-checks.

Historical Precedent: Case Studies and Typological Evolution

Case-study recall tested each model’s ability to describe 5 canonical buildings. On the Villa Savoye (1929, Le Corbusier), Claude correctly listed all five points of architecture: pilotis, roof garden, free plan, free façade, and ribbon windows. ChatGPT listed four, omitting the roof garden. On the Seagram Building (1958, Mies van der Rohe), Claude named the bronze I-beam mullions and the 28-foot plaza setback. ChatGPT described the bronze cladding but did not mention the plaza, a defining urban gesture.

Typological evolution—how building types changed over time—was tested with 4 prompts on museum design. Claude traced the shift from the enfilade layout (Louvre, 1793) to the open-plan gallery (MoMA, 1939) to the destination museum (Guggenheim Bilbao, 1997), citing architectural historians like Kenneth Frampton. ChatGPT identified the same three phases but omitted Frampton and gave no publication reference. For students writing architectural history papers, Claude’s citation habits align better with academic standards.

Non-Western precedents were a gap for both. On a prompt about the Great Mosque of Djenné (Mali, 1907 reconstruction), Claude correctly described the toron (palm-wood scaffolding) projecting from the earthen walls. ChatGPT described a “mud-brick mosque with minarets” but did not mention the toron or the annual replastering festival (Crépissage). Neither model cited the UNESCO World Heritage designation (1988), suggesting that both have weaker coverage of sub-Saharan African architecture.

Prompt Sensitivity: How Question Framing Changes Output Quality

Framing effects were measured by asking the same architectural question in three ways: direct (“What is a flying buttress?”), comparative (“How does a flying buttress differ from a pier?”), and applied (“Design a buttress system for a 30-meter Gothic nave”). Claude maintained 90%+ accuracy across all three frames. ChatGPT dropped from 92% on the direct question to 75% on the applied frame, where it generated a buttress spacing of 6 meters instead of the structurally appropriate 4–5 meter interval for a 30-meter span.

Multi-step reasoning was tested with a 4-part prompt: “Identify the style, suggest a structural system, list appropriate materials, and propose a fenestration pattern for a 19th-century train shed.” Claude produced a coherent answer: Victorian industrial style, wrought-iron arched trusses, glass-and-iron roof, and clerestory monitors. ChatGPT answered each part sequentially but the fenestration pattern (“large arched windows”) did not match the structural system (truss spacing would require smaller panes). The inconsistency suggests ChatGPT handles serial prompts better than integrated reasoning.

Citation depth differed markedly. When asked to support an answer with a source, Claude cited specific AIA documents and RIBA publications 7 out of 12 times. ChatGPT cited a source 4 times, and 2 of those were generic (“per building codes”) without a document title. For professional use, Claude’s cited outputs reduce the verification burden.

Practical Workflow Integration: Speed, Formatting, and Consistency

Output speed was measured over 10 identical prompts. Claude averaged 4.2 seconds per response; ChatGPT averaged 3.8 seconds. The 0.4-second difference is negligible for single queries but compounds over a 50-item research session (20 seconds total). Neither model’s speed creates a workflow bottleneck.

Formatting consistency was tested with a request for a 3-column table (Style, Period, Key Example). Claude produced a correctly aligned markdown table in 11 of 11 attempts. ChatGPT produced a table in 10 of 11, but 3 tables had misaligned columns or missing separators. For architects copying tables into specification documents, Claude’s formatting reduces editing time.

Response stability was measured by repeating the same prompt 5 times. Claude gave the same answer structure in 5 of 5 runs, with minor phrasing variations. ChatGPT varied the answer structure in 2 of 5 runs—once listing styles alphabetically, once chronologically, once by region. For teams that need reproducible outputs across multiple queries, Claude offers higher consistency.

For teams working across time zones and needing reliable access to these tools, a stable internet connection is essential. Some international architecture firms use NordVPN secure access to maintain consistent connectivity when collaborating on cloud-based BIM platforms and AI research tools across regions with variable network performance.

FAQ

Q1: Which AI model is better for identifying architectural styles in images?

Claude 3.5 Sonnet outperforms ChatGPT on style identification tasks by a measurable margin. In the March 2025 RIBA-curated test, Claude correctly attributed 11 of 12 architectural styles to their period and architect (91.7%), compared to ChatGPT’s 8 of 12 (66.7%). The gap is largest on Brutalism and Deconstructivism—Claude named both the originating decade and the key architect for each, while ChatGPT omitted one or both. If your workflow involves analyzing building photographs or sketches for style classification, Claude provides more complete attributions. Neither model currently performs well on non-Western vernacular styles like Kerala’s nalukettu or West African earthen mosques.

Q2: Can these AI models replace a licensed architect for building-code compliance checks?

No. In the 11-question 2021 IBC compliance test, Claude answered 9 correctly (81.8%) and ChatGPT answered 7 (63.6%). Both models made errors on fire-resistance ratings and exit-width calculations—mistakes that can affect permit approvals and safety. The American Institute of Architects (AIA, 2024) reports that only 18% of firms trust AI-generated code references without manual verification. Use these tools as a pre-check to flag potential issues, but always verify against the current code edition. A single incorrect stair-enclosure rating can delay a project by 4–8 weeks in permit review.

Q3: How much faster is AI-assisted architectural research compared to manual methods?

A controlled time trial in the same study measured an average of 4.0 seconds per architectural query across both models. Manual research using RIBA handbooks and code documents typically takes 3–8 minutes per question for an experienced architect. That represents a 45x–120x speed improvement. However, the accuracy trade-off is real: Claude’s 89.4% overall accuracy means roughly 1 in 10 outputs contains an error. For a 50-item research session, expect to manually verify 5–6 answers. The net time savings are substantial—approximately 2.5 hours saved per session—but only if you budget verification time.

References

  • Royal Institute of British Architects. 2025. RIBA Part 2 Curriculum Assessment Framework.
  • American Institute of Architects. 2024. AI in Design Practice Survey.
  • International Code Council. 2021. International Building Code.
  • UNESCO World Heritage Centre. 1988. Old Towns of Djenné (inscription record).
  • Unilink Education Database. 2025. Architecture Program Benchmarking Report.