The CS 6120 Course Blog

Facade Layout Compiler

by I-Ting Tsai

Source Code

What was the goal?

The goal of this project was to design and implement a Facade Layout Compiler—a domain-specific compiler that transforms textual descriptions of building facades into verified, renderable outputs. The input language describes a facade as a symbolic grid of architectural elements (windows, doors, chimneys, spacing, and empty cells), and the compiler produces:

Verified layouts: The compiler performs static semantic analysis to enforce architectural constraints such as door placement rules, chimney connectivity, and optional window spacing requirements.

Renderable outputs: For valid programs, the compiler generates both SVG visualizations and structured JSON representations suitable for downstream processing or generative design workflows.

This project directly mirrors a traditional compiler pipeline—lexical analysis, parsing, semantic analysis, and code generation—but operates in a visually interpretable domain. The choice of architectural facades as the target domain was intentional: it provides immediate visual feedback for correctness, making it easier to reason about both the compiler’s behavior and the quality of its output.

Alignment with Proposal

My original proposal outlined three key deliverables:

  1. Custom DSL for facade specification: Implemented with support for row declarations, repeat expressions, compact syntax, comments, and rule declarations
  2. Static rule enforcement: Distinct semantic checks with error/warning classification
  3. Empirical validation through testing: pass a comprehensive test suite

The scope evolved slightly during implementation. More details are described in the sections below.


What did you do?

DSL Design

The facade DSL is designed as a minimal, domain-specific language for describing building facades as discrete 2D grids. Each facade is expressed as a set of numbered row declarations, where each row contains a sequence of symbolic elements:

Example program:

row 1: E E E E E E E E
row 2: E E E E E E E E
row 3: E W S S S S W E
row 4: E W S S S S W E
row 5: E E E D D E E E

This grid-based representation provides a clear correspondence between source-level specification and spatial layout. The choice of single-character symbols was deliberate: it enables the compact syntax feature (described below) while remaining visually scannable.

The row numbers aren’t strictly needed since the rows could just be inferred by order, but I was thinking that this would keep the language flexible for future extensions where it might be useful to refer to specific rows (e.g., changing row heights or other row-level attributes).

Expressiveness and Usability Features

To balance expressiveness with simplicity, the DSL supports several usability-oriented features:

  1. Compact syntax: Symbols may be written without spaces (e.g., EWCWE), allowing concise row specifications. The lexer recognizes sequences of valid symbols and expands them into individual cells during tokenization.

  2. Repeat expressions: Elements can be repeated using SYMBOL*COUNT syntax (e.g., W*5 expands to five window cells). This significantly reduces verbosity for large facades and was particularly useful for stress testing.

  3. Zero-count repeats: The expression E*0 produces zero cells, effectively allowing conditional exclusion. This edge case required careful handling—the parser simply doesn’t add any cells to the AST rather than treating it as an error.

  4. Flexible whitespace and comments: Arbitrary whitespace (spaces, tabs, newlines) is permitted between tokens. Line comments starting with # are ignored by the lexer. This allows users to format facade programs for clarity.

  5. Case insensitivity: Both E and e are valid (normalized to uppercase internally).

  6. Rule declarations: Optional rule declarations (e.g., rule min_window_spacing: 2) parameterize semantic checks. Rules are stored in the IR and consulted during analysis.

  7. Auto-fill behavior: Rows may have different lengths; the compiler automatically pads shorter rows with empty cells to create a rectangular grid. Auto-filled cells are tracked in metadata so users can distinguish compiler-inferred content from explicit specification.

Design Rationale

The DSL intentionally avoids geometric coordinates, numeric dimensions, or continuous values. By symbolic structure, the facade is described with a small set of symbols laid out on a grid, indicating relationships such as two windows sitting next to each other, without specifying exact dimensions or coordinates.


Front-End: Lexer and Parser

Lexer Implementation

I implemented a hand-written lexer that performs character-level scanning. The lexer maintains source location tracking (line and column) for error reporting.

Parser Implementation

The parser is a hand-written top-down parser that consumes the token stream and constructs an AST consisting of:

Grammar (EBNF):

program     ::= rule_decl* row_decl+
rule_decl   ::= 'rule' IDENTIFIER ':' rule_value+
rule_value  ::= INTEGER | IDENTIFIER
row_decl    ::= 'row' INTEGER ':' cell+
cell        ::= SYMBOL | repeat_expr
repeat_expr ::= SYMBOL '*' INTEGER
SYMBOL      ::= 'E' | 'W' | 'S' | 'D' | 'C'

Note on Grammar Classification: This grammar is regular, and the structure is purely sequential without nesting. The language could equivalently be expressed as a regular expression:

(rule IDENTIFIER : rule_value+)* (row INTEGER : (SYMBOL | SYMBOL '*' INTEGER)+)+

Note on Parser Classification: The parser follows a top-down parsing strategy with one function per major grammar production (parse(), parse_rule(), parse_row(), parse_cells()). It is not strictly a recursive-descent parser because no function calls itself—the grammar’s repetition operators (* and +) are implemented with iterative while loops rather than recursive calls. This iterative approach is natural for a regular grammar where recursion is unnecessary. The parser uses single-token lookahead via check() to predict which production to apply, making it a predictive parser.


Middle-End: Semantic Analysis and Normalization

The semantic analyzer transforms the parsed AST into a normalized grid representation and enforces architectural constraints.

Grid Construction and Normalization

The analyzer builds a 2D grid from row declarations, filling any gaps in row numbering with empty rows and padding shorter rows with E cells to ensure a rectangular grid. Auto-filled cells are tracked in metadata to distinguish compiler-inferred content from explicit specification.

Architectural Rule Checks

Structural violations (doors, chimneys) produce errors and halt compilation, while stylistic concerns (spacing, symmetry) produce warnings only.


Back-End: Code Generation

For valid programs, the compiler generates two output formats:

JSON Output

The JSON output includes:

This format is designed for programmatic consumption. The structured data can be used by downstream tools for further processing, constraint solving, or generative design.

SVG Output

The SVG generator produces a visual representation where each cell is rendered as a colored rectangle with a label. Cells are grouped by type in the SVG DOM, making it easy to style or manipulate specific element types.

PNG Rasterization

For environments that don’t support SVG, the compiler can produce PNG output via CairoSVG.

Example Facades:

Version 1:

row 1: E E E E E E E E C E E 
row 2: S S S S S S S S C S S 
row 3: S S W W S S S W W S S 
row 4: S S W W S S S W W S S 
row 5: S S S S S S S S S S S 
row 6: E 
row 6: S S D D S S W W W W S
row 7: S S D D S S W W W W S 

Facade 1

Version 2:

row 1: E W S E*2 W W E S W E
row 2: W E W S E E S W E*3
row 3: E S E W*2 S E E W S E
row 4: S W S S E W E S E W E
row 5: E E W E S*2 W E E S W
row 6: W S E E W E S W S E E
row 7: D*2 E D E E D E D*2 E

Facade 2

Version 3:

row 1: E C E S E S E C E
row 2: E C E W E W E C 
row 3: E C E W E W E C 
row 4: E C E W W W E C E
row 5: E C E W E W E C E
row 6: E C E W E W E C 
row 7: E C E S S S E C E
row 8: E E E S S S 
row 9: E D D E E E D D E

Facade 3

Version 4:

row 1: E E C E*2 
row 2: W W C C W W W W W W W
row 3: W S S C*2 S S S W S W
row 4: W S S S C C S W*2 S W
row 5: E S*4 C C S S S 
row 6: E D D S*3 C S D D 

Facde 4

Version 5:

row 1: E*1 C*1 E*4 C*1 E*4 c*1 e*1
row 2: E*1 C*1 S*1 W*3 C*1 W*3 S*1 c*1 E*1
row 3: E*1 C*1 S*1 W*1 S*1 W*1 C*1 W*1 S*1 W*1 S*1 C*1 E*1
row 4: E*1 C*1 S*1 W*1 S*1 W*1 C*1 W*1 S*1 W*1 S*1 C*1 e*1
row 5: E*1 C*1 S*1 W*3 C*1 W*3 S*1 C*1 E*1
row 6: e*0 E*1 S*11 E*1
row 7: E*0 E*1 W*2 S*3 W*1 S*3 W*2 E*1
row 8: E*0 D*2 S*1 E*1 S*1 D*3 S*1 e*1 s*1 d*2

Facade 5


What were the hardest parts to get right?

1. Error Classification

I classified errors by compiler phase: lexical errors for invalid characters (e.g., @, emoji), syntax errors for malformed constructs (e.g., missing colons), and semantic errors for valid syntax with invalid meaning (e.g., floating doors, unknown symbols like X).

2. Chimney Connectivity Analysis

Initially, I tried a simple positional check (e.g., chimney cells must be on the top row). This failed for common patterns where chimneys are connected and continuous to reach the ground:

row 1: C E E
row 2: C C S
row 3: S C C
row 4: S S C

This chimney is valid. It connects to the top through a diagonal path. The flood-fill approach correctly handles this by treating chimneys as connected components rather than isolated cells.

A subtle bug emerged with multiple chimney components: the algorithm must check each component independently, not just verify that some chimney reaches the roof.

3. Normalization Without Hiding Bugs

Auto-filling rows and padding widths improved usability but introduced a risk: it could mask user errors. If a user forgets a cell, the compiler silently adds an empty cell, which potentially hiding bugs.

I addressed this by:

This makes compiler intervention visible without treating it as an error.

4. Error Recovery and Multiple Error Reporting

I implemented basic error recovery so the compiler can report multiple errors in a single pass: the lexer reports invalid characters but continues scanning, and the parser skips to the next row after encountering a syntax error.

6. Comprehensive Test Coverage

The most time-consuming aspect was ensuring comprehensive test coverage. My initial tests covered obvious paths but missed edge cases like:

I used an LLM as a test ideation tool: after implementing a feature, I asked it to propose adversarial inputs and edge cases. I then checked and determined the expected behavior and added these cases to my test suite.


Were you successful?

Quantitative Evaluation

The testing results printout is available in test.txt

Test Suite Composition

The test suite comprises 146 facade programs organized into 22 categories:

CategoryTestsDescription
Basic5Simple valid inputs
Syntax11Syntax variations (compact, repeat, case)
Comments4Comment handling
AutoFill5Auto-fill behavior
RowNum5Row number handling
Rules4Rule declarations
Door8Door structural validation
Chimney8Chimney connectivity validation
WinSpace4Window spacing rule
BadChar18Invalid character detection
BadSymbol6Invalid symbol detection
SyntaxErr18Syntax error handling
Edge10Edge cases
Complex4Complex valid facades
Boundary4Boundary conditions
WS5Whitespace edge cases
BadSymbol26Additional invalid symbol tests
Stress5Stress tests (large grids, many rows)
Recovery3Error recovery
Combo4Feature combinations
ZeroRepeat9Zero-count repeat expressions

All 146 tests pass.

Stress Test Results

The compiler handles extreme inputs without issues:

Additional Experiment

I conducted an experiment using an image generation model (Nano Banana) to produce photorealistic facade images from compiled layouts. The workflow:

  1. Compile facade DSL → JSON + SVG (+ PNG)
  2. Construct prompt from grid structure and style description
  3. Generate image using multimodal model

Text prompt format: Generate a [-style] facade of a [low-rise/mid-rise/high-rise] building based on this image, where W = window, D = door, S = spacing, E = empty, and C = chimney.

The results were correct most of the time. The model understood the general layout but sometimes struggled with precise spatial positioning. This suggests the compiled output could serve as input for generative design, though current models may need additional guidance.

Facade Example 1:

PNG output: PNG output

Modernist-style: Modernist-style

Biophilic-style: Biophilic-style

Postmodern-style: Postmodern-style

Facade Example 2:

PNG output: PNG output

Bauhaus-style: Bauhaus-style

Neo-Futurist-style: Neo-Futurist-style

Post-and-Beam Modern-style: Post-and-Beam Modern-style

Qualitative Validation

I output 5 facade examples (json, svg, png) together with image gen visualizations which are available in the example folder.


Conclusion

This project shows that the classical compiler pipeline—lexing, parsing, semantic analysis, and code generation—maps cleanly onto domain-specific languages, even in nontraditional domains such as architectural layout. Treating facades as programs rather than data made it possible to enforce structural invariants that would be difficult to express or validate in other formats.

Building the compiler highlighted several practical lessons: DSL design requires careful trade-offs between expressiveness and complexity; high-quality error messages are essential for usability; and comprehensive testing is challenging in the absence of established benchmarks, requiring systematic, feature-driven test design. The phased architecture proved especially valuable for debugging and extension, as clear phase boundaries made errors easy to localize and new features straightforward to add.

Overall, this experience reinforced the power of compiler techniques as a general problem-solving framework. The resulting facade compiler provides a solid foundation for future extensions, such as richer architectural constraints, integration with generative design systems, or interactive feedback during editing.

It was a great semester, even though I was personally juggling a lot. Coming from a very different background, I learned a lot from this course and developed a better understanding of what a compiler is and how it works. Thanks to everyone for the great online and in-class discussions, and especially to Adrian and Kei for the guidance and help throughout the semester.