Generative early architectural visualizations: incorporating architect’s style-trained models

architects’ style, rendering, visualization, generative AI, design automation, computational design

Highlights

Generative artificial intelligence (AI) adeptly identifies diverse architectural styles and features.
We explored the effectiveness of AI models in architectural visualization.
We established datasets and introduced enhanced training for unique architectural nuances.
We streamlined prompts, data preparations, and training methods for varied architectural designs.
Our findings showcase 20 unique architects’ styles, seamlessly integrating them with the “AI renderer”.

1. Introduction

The field of architecture has long recognized the pivotal role of visualization in conveying design concepts, ideas, and spatial arrangements (Greenberg, 1974). Visualization bridges the gap between abstract concepts and tangible representations, allowing architects to translate their creative visions into visual forms that can be easily comprehended by various audiences (Chen, 2004). By creating realistic or conceptual visualizations, architects can explore design alternatives, evaluate spatial qualities, and make informed decisions while designing (Kunze et al., 2012). Consequently, architectural visualization aids in effective decision-making for architects, designers, clients, and stakeholders (Akin, 1978).

Visualization techniques have advanced to ensure uninterrupted and efficient design progress. The contemporary approach involves a sequence of steps: three-dimensional (3D) modeling upon design finalization, GPU rendering, and post-editing (Fig. 1a; Yildirim & Yavuz, 2012). These advancements allow architects to create high-fidelity visualizations, offering a comprehensive understanding of unbuilt structures (Koutamanis, 2000). Despite considerable improvements in speed and quality, visualization remains complex and time-consuming. It is often replaced by using general reference images for ideation or deferred until later stages for presentation (Bouchlaghem et al., 2005).

Figure 1:

Overview of the research. (a) Conventional visualization approach and (b) proposed approach. Refer to Fig. 2 for the details of (b).

The emergence of artificial intelligence (AI) and machine learning (ML)-based image generation models enables rapid image creation from textual descriptions (Ramesh et al., 2021; Saharia et al., 2022b). Implementing such technology in architecture fosters innovative solutions and accelerates the design process. However, publicly available pretrained image generation AI models are usually trained using common objects and cannot be directly applied for specialized fields such as architecture. Existing research mainly focuses on reviewing the potential of these pretrained AI models and image editing techniques (Ploennigs & Berger, 2023). Therefore, further research is needed to tailor and enhance the performance of these models using domain-specific architectural images for their application.

This study investigates the generation of architectural visualizations, including reference images and initial renderings, through AI-based image generation methods (Fig. 1b). Furthermore, it assesses the performance of default AI model based on architects’ styles and features (Section 3). Based on these findings, the design styles and features of various architects are defined (Section 4). Section 5 focuses on styles with lower similarity rates that are addressed via additional training. Finally, Section 6 demonstrates the practical applications of these visualization methods across diverse scenarios.

2. Background

2.1. Development of architectural visualization techniques

The evolution of architectural visualization techniques has paralleled technological advancements. Traditionally, architects relied on hand-drawn sketches, paintings, and physical models to communicate their design concepts (Al-Kodmany, 2001; Atilola et al., 2016). While expressive, these methods were limited in scale, accuracy, and time efficiency. The advent of computer-aided design (CAD), followed by building information modeling (BIM) and rendering engines, introduced photorealistic rendering techniques, marking a pivotal shift in architectural visualization (Fonseca et al., 2013; Xu et al., 2023).

2D CAD drawings enabled precise and modifiable digital representations (Chiu, 1995). The development of 3D modeling enabled a deeper understanding of spatial relationships and volumetric compositions (Eastman, 1999; Hong et al., 2022; Ma et al., 2023; Xu et al., 2016; Yan et al., 2011). In combination with rendering engines, these tools created photorealistic renderings, capturing intricate material textures and replicating lighting conditions (David et al., 2022; Li et al., 2017). Photorealistic visualization technologies now extend beyond the limitations of flat screens, immersing stakeholders in virtual environments and allowing interactive experience with designs (Han & Leite, 2022; Korkut & Surer, 2023; Lee et al., 2023b; Ma et al., 2023).

Advanced visualization tools, such as CAD, BIM, and rendering engines, have led to precise and authentic visualizations, resulting in considerable time efficiency, cost reductions, streamlined editing, efficient data storage, and innovative alternatives (Azhar, 2011; Chen et al., 2023; Lee et al., 2023a). However, creating highly realistic visualizations still requires meticulous modeling, high-end computer specifications, significant time, and specialized skills (Azhar, 2011; Fonseca et al., 2017). As a result, accessibility to these advanced visualization technologies remains somewhat constrained.

Ongoing research aims to refine and create more user-friendly visualization tools. Automation is emerging as a promising solution, simplifying and expediting the visualization process. Leveraging ML and AI, architects can automate various aspects of visualization, allowing more time for in-depth design exploration and decision-making (Castro et al., 2021; Lee et al., 2024; Ploennigs & Berger, 2023; Qian et al., 2023).

2.2. ML and generative artificial intelligence

The evolution from ML to generative artificial intelligence (Gen AI) marks a transformative journey in the field of AI. Initially, ML focused on enabling computers to learn patterns and make predictions from data (Bishop, 2006; Janiesch et al., 2021). AI now processes diverse data types, including text, images, videos, and audios, identifies objects, analyzes patterns, and provides predictions across various domains, including architecture (Kakooee & Dillenburger, 2024; Katsigiannis et al., 2023; Lee et al., 2012; Mathew et al., 2021; Park & Cha, 2023; Song et al., 2020a; Wei et al., 2022). In the field of architecture, many researchers have dedicated significant time to processing images and videos related to design and construction sites (Park & Hyun, 2022; Qian et al., 2023; Zhang et al., 2022). They have focused on developing technologies for object recognition within these visual materials. Based on these advancements, they propose methods to identify various elements, from design styles to construction site rule violations, aiming to manage related data more effectively. These technologies are becoming more streamlined and sophisticated, requiring less manual engineering, as their development accelerates at an unprecedented pace (LeCun et al., 2015). AI now extends beyond data analysis and classification extending into generation of new data based on learned content (Kim et al., 2024).

Gen AI focuses on creating data that mirrors specific input datasets. In 2014, generative adversarial networks (GANs) were introduced, marking a paradigm shift in Gen AI. Goodfellow et al. (2014) pioneered GANs, generating realistic synthetic data through a competitive interplay between a generator and a discriminator. In architecture, GANs have been used to differentiate and classify interior design styles based on learned data (Kim et al., 2019). Cho et al. (2020) transformed hand-drawn architectural blueprints into vectorized drawings. Kikuchi et al. (2022) visualized future scenarios by editing existing buildings from videos. Rahbar et al. (2022) generated architectural layouts for particular topological conditions and geometrical constraints. These studies have advanced research in generative imagery, including image generation and edition (Goetschalckx et al., 2019; Karras et al., 2019).

In 2015, Sohl-Dickstein et al. (2015) proposed the concept of diffusion models (DMs), combining generative models with natural language. DMs use a hierarchy of denoising autoencoders for high-quality image synthesis through a reversed diffusion process (Ho et al, 2020; Song et al., 2020b). Notably, DMs avoid issues like mode-collapse, training instabilities, and vast parameter counts observed in GANs and variational autoencoders (Rombach et al, 2022). They apply to diverse tasks beyond text-to-image (txt2img) generation, which generates images from text prompts, and image-to-image (img2img) generation, which modifies existing images based on a text prompt, along with inpainting, outpainting, up-scaling, and stroke-based synthesis (Kawar et al., 2022; Kim et al., 2023; Li et al., 2017; Lugmayr et al., 2022; Meng et al., 2021; Saharia et al., 2022a). From GANs to DMs, ML-based Gen AI redefines the boundaries of AI creativity (Oppenlaender, 2022), and holds immense potential across industries, including architecture.

2.3. Potential of image generation AI for architectural visualization

In 2020, large language models (LLMs) emerged as transformative tools in natural language processing. Notably, OpenAI (2021)’s GPT-3 employs transformer architectures to comprehend and generate human-like text (OpenAI & Pilipiszyn, 2021). This evolution has positively impacted AI models which generate images based on text (txt2img generation models; Ploennigs & Berger, 2023), as well as on DMs. Representative image generation AI platforms built on LLMs include Midjourney (Midjourney Inc., 2022), DALL·E2 (OpenAI, 2022), and Stable Diffusion (SD; Stability AI, 2022), etc.

Midjourney (Midjourney Inc., 2022) employs transformers to generate detailed images based on textual descriptions (Oppenlaender, 2022). DALL·E2 (OpenAI, 2022) introduces an encoder–decoder architecture capable of producing images from textual descriptions containing novel combinations of objects and concepts (Ramesh et al., 2021). SD (Stability AI, 2022) is a recently proposed txt2img model that utilizes a latent diffusion process to create images from text, allowing gradual refinement of images (Rombach et al, 2022). Each platform provides the ability to obtain high-quality images through txt2img and img2img approaches, with the potential to achieve desired results through meticulous prompt engineering.

Currently, various studies have explored Gen AI, particularly image generation AI, in creative fields like the arts. Oppenlaender (2022) investigated the AI-based image generation and edition, viewing AI as a tool to extend human creativity. In architecture, Ploennigs and Berger (2023) conducted case studies on various image generation platforms to explore their potential for architectural visualization. Jo et al. (2024) introduced a method for rendering building façades using regional styles. However, these studies either focused on reviewing platforms’ capabilities and limitations (Ploennigs & Berger, 2023), or were limited to a single regional style and façade (Jo et al., 2024). Hence, there is a need for further research on the diverse applications of Gen AI’s visualization technology.

AI-assisted visualization in architecture can automate labor-intensive tasks, transforming abstract concepts into vivid representations. Regardless of design stages or material readiness, this approach can expedite the visualization process and accelerate design cycles. Consequently, architects will get more time to focus on crucial design decisions and facilitate effective communication with clients, collaborators, and the public (Epstein et al., 2023; Oppenlaender, 2022; Ploennigs & Berger, 2023). Therefore, our purpose is to explore and amplify the potential of integrating Gen AI into the visualization process to enhance the architectural design process. For this purpose, this study, focusing on architects’ styles, systematically trains architectural expertise to fine-tune the performance of existing models and demonstrates various applications in design process based on these enhancements (Fig. 2).

Figure 2:

Early architectural visualization by architect’s style and feature using Gen AI.

Due to the varying strengths and weaknesses of each platform, users must select the platform that aligns with their desired outcomes and functionalities. In this study, SD (Stability AI, 2022) is identified as highly suitable for AI-aided architectural visualization generation, due to its learning capacity, stability during training, controlled generation process, and consistent production of high-quality images from textual prompts. The following sections will explore the practical implementation and implications of the SD model within architectural design.

3. Image Generation Based on Architect’s Style and Feature

3.1. Image generation with Gen AI

There are two primary approaches of image generation in SD: txt2img and img2img. The former involves generating images solely from textual prompts, while the latter involves providing both a textual prompt and a seed image as input to modify the given seed image based on the prompt. Therefore, the txt2img generation approach is useful for more flexible idea visualization unconstrained by form, while img2img is beneficial for tailored continuous idea development. Depending on the nature of each approach, they can be utilized for various tasks such as generating reference images and rendering images. These two image generation approaches can be formally defined as follows:

$$\begin{eqnarray} \textit{generate}\left( {M,\textit{Para}{m_{\rm G}},{P_{\rm t}}} \right) = Im{g_{\rm G}} \end{eqnarray}$$

(1)

$$\begin{eqnarray} \textit{generate}\left( {M,Im{g_{\rm S}},\textit{Para}{m_{\rm G}},\textit{Para}{m_{\rm P}},{P_{\rm t}}} \right) = Im{g_{\rm G}} \end{eqnarray}$$

(2)

$$\begin{eqnarray} \textit{getprompts}\left( {Im{g_{\rm t}}} \right) = {P_{\rm t}} \end{eqnarray}$$

(3)

$$\begin{eqnarray} \textit{Most}ofIm{g_{\rm G}} \in Im{g_{\rm t}}. \end{eqnarray}$$

(4)

The “|$generate()$|” function in an image generation AI model (M) creates an image (⁠|$Im{g_{\rm G}}$|⁠) based on the generation parameters (⁠|$Para{m_{\rm G}}$|⁠) and text prompts (⁠|${P_{\rm t}}$|⁠) that describe the target. When using the “|$generate()$|” function with a seed image (⁠|${\rm{}}Im{g_{\rm S}}$|⁠) in the img2img approach, the |${\rm{}}Im{g_{\rm S}}$| is processed according to the processing parameters (⁠|$Para{m_{\rm P}}$|⁠), allowing M to generate |$Im{g_{\rm G}}$| based on both |${\rm{}}Im{g_{\rm S}}$| and |${P_{\rm t}}$|⁠. |${P_{\rm t}}$| is derived from the desired target image (⁠|$Im{g_{\rm t}}$|⁠), through the process “|$getprompts()$|”, serving as a textual representation of |$Im{g_{\rm t}}$|⁠. As a result, |$Im{g_{\rm G}}$| demonstrates a resemblance to |$Im{g_{\rm t}}$| and predominantly belongs to a group sharing similarities with the target:

$$\begin{eqnarray} \textit{Para}{m_{\rm G}} = \left\{ {\textit{resolution},\textit{sampling}\ \textit{method},\textit{sampling}\ \textit{steps},CFG\ \textit{scale}} \right\}\\ \end{eqnarray}$$

(5)

$$\begin{eqnarray} \textit{Para}{m_{\rm P}} = \left\{ {\textit{processor},\textit{control}\textit{weight},\textit{control}\textit{mode}} \right\} \end{eqnarray}$$

(6)

$$\begin{eqnarray} {P_{\rm t}} = \left\{ {SDP,RQP} \right\}. \end{eqnarray}$$

(7)

|$Para{m_{\rm G}}$| comprises four components necessary for defining image generation. These essential elements include |$resolution$|⁠, which determines the image dimensions in pixels; |$sampling{\rm{\,\,}}\textit{method}$|⁠, which refers to the type of technique used to extract samples from the latent space; |$sampling{\rm{\,\,}}\textit{steps}$|⁠, which determine the number of intermediate stages between the initial and final states during the diffusion process, substantially impacting the level of detail in results; and |$CFG{\rm{\,\,}}\textit{scale}$| (classifier-free guidance scale), which indicates the level of autonomy or reliance on predefined classifiers in the AI model. |$Para{m_{\rm P}}$| comprises three components, with the |$processor$| referring to the type of methods used to recognize |${\rm{}}Im{g_{\rm S}}$|⁠. Depending on the |$processor$|⁠, the method of detecting |${\rm{}}Im{g_{\rm S}}$| varies, such as detecting boundaries based on image contrast or detecting shapes based on image depth or distance. |$control{\rm{\,\,}}\textit{weight}$| indicates how closely the detected shape of the seed image will be adhered to, representing the degree of allowance for change, while |$control{\rm{\,\,}}\textit{mode}$| indicates whether the prompt or the seed image is given more priority.

|${P_{\rm t}}$| is composed of two types of prompts: scene description prompt (⁠|$\mathrm{ SDP}$|⁠) and resolution quality prompt (⁠|$\mathrm{ RQP}$|⁠). While it is possible to generate images by providing only scene and context descriptions, the probability of obtaining the desired image and quality may be low. Therefore, it is necessary to employ prompt engineering to describe the |$Im{g_{\rm t}}$| systematically and precisely, as illustrated in Table 1. The |$\mathrm{ SDP}$| encompasses not only main description but also the graphic style and composition of resultant images. The |$\mathrm{ RQP}$| pertains to prompts related to the image’s resolution quality, allowing users to achieve the desired image quality. Lastly, to prevent errors and dissimilar image results, it is crucial to utilize negative prompts to exclude keywords that should be avoided or do not align with the |$Im{g_{\rm t}}$|⁠.

Table 1:

Structure of |${P_{\rm t}}$| for high-quality image generation.

Type	Content	Positive prompt example	Negative prompt example
SDP	Main description (about scene and context)	A house with Mondrian’s color palette, located in a forest, a cat sitting in a chair, kids running around the house, etc.	Dogs, department, tower, cars, located in a city, at night, etc.
	Graphic style	Professional photograph, photorealistic rendering, etc.	Watercolor painting, oil painting, drawing, sketch, cartoonish, etc.
	Composition (angle, lighting, etc.)	Full shot, deep depth of field, high-key lighting, natural lighting, two-point perspective, etc.	Bird’s-eye view, isometric, portrait, cropped view, etc.
RQP	Resolution	Realistic shadows, enhanced-detail, v-ray rendering, full HD, masterpiece, highly detailed, high quality, 8k, etc.	Low quality, too much noise, normal quality, watermark, blurry textured, blurry, noise, faint, text, etc.

Type	Content	Positive prompt example	Negative prompt example
SDP	Main description (about scene and context)	A house with Mondrian’s color palette, located in a forest, a cat sitting in a chair, kids running around the house, etc.	Dogs, department, tower, cars, located in a city, at night, etc.
	Graphic style	Professional photograph, photorealistic rendering, etc.	Watercolor painting, oil painting, drawing, sketch, cartoonish, etc.
	Composition (angle, lighting, etc.)	Full shot, deep depth of field, high-key lighting, natural lighting, two-point perspective, etc.	Bird’s-eye view, isometric, portrait, cropped view, etc.
RQP	Resolution	Realistic shadows, enhanced-detail, v-ray rendering, full HD, masterpiece, highly detailed, high quality, 8k, etc.	Low quality, too much noise, normal quality, watermark, blurry textured, blurry, noise, faint, text, etc.

Table 1:

Structure of |${P_{\rm t}}$| for high-quality image generation.

Type	Content	Positive prompt example	Negative prompt example
SDP	Main description (about scene and context)	A house with Mondrian’s color palette, located in a forest, a cat sitting in a chair, kids running around the house, etc.	Dogs, department, tower, cars, located in a city, at night, etc.
	Graphic style	Professional photograph, photorealistic rendering, etc.	Watercolor painting, oil painting, drawing, sketch, cartoonish, etc.
	Composition (angle, lighting, etc.)	Full shot, deep depth of field, high-key lighting, natural lighting, two-point perspective, etc.	Bird’s-eye view, isometric, portrait, cropped view, etc.
RQP	Resolution	Realistic shadows, enhanced-detail, v-ray rendering, full HD, masterpiece, highly detailed, high quality, 8k, etc.	Low quality, too much noise, normal quality, watermark, blurry textured, blurry, noise, faint, text, etc.

Type	Content	Positive prompt example	Negative prompt example
SDP	Main description (about scene and context)	A house with Mondrian’s color palette, located in a forest, a cat sitting in a chair, kids running around the house, etc.	Dogs, department, tower, cars, located in a city, at night, etc.
	Graphic style	Professional photograph, photorealistic rendering, etc.	Watercolor painting, oil painting, drawing, sketch, cartoonish, etc.
	Composition (angle, lighting, etc.)	Full shot, deep depth of field, high-key lighting, natural lighting, two-point perspective, etc.	Bird’s-eye view, isometric, portrait, cropped view, etc.
RQP	Resolution	Realistic shadows, enhanced-detail, v-ray rendering, full HD, masterpiece, highly detailed, high quality, 8k, etc.	Low quality, too much noise, normal quality, watermark, blurry textured, blurry, noise, faint, text, etc.

3.2. Image generation test for architects’ styles

An intensive image generation test to evaluate the performance of the SD model, specifically for architectural visualization with architects’ design styles and features was conducted. The primary objective of the test was to assess the extent to which the pretrained model recognizes architects’ styles. The test primarily focused on txt2img due to its relatively unconstrained nature for generating images based on the same model as img2img. This approach allowed us to assess the optimum performance of the pretrained model. Therefore, for the test, all the images were generated based on Equation (1), with the target scenes set as residential houses reflecting various architects’ design styles. Each architect’s style was treated as an independent variable for the test. We randomly selected 20 architects and applied their styles.

By providing detailed prompts, efforts can be made to generate images that closely resemble the target scenes. However, to accurately discern if the SD default model recognizes specific design styles and features used by real-world architects, certain style-related words were intentionally omitted. Consequently, for |${P_{\rm t}}$|⁠, the main description prompt was given as “architect name-inspired residential house” to ensure more precise comparison between the independent variables. To facilitate a more accurate comparison, the “photorealistic rendering prompt set” [a collection of positive and negative prompts specifically designed to generate high-resolution images with a photorealistic rendering style: positive prompt (professional photograph, photorealistic rendering, realistic, enhance-detail, v-ray rendering, full HD, masterpiece, highly detailed, high quality, 8k, two-point perspective, exterior view, full shot, deep depth of field, f/22, high-key lighting, natural lighting, and realistic shadows) and negative prompt (low quality, bad proportion, awkward shadows, unrealistic lighting, pixelated textures, too much noise, unrealistic reflections, normal quality, watermark, bad perspective, confusing details, blurry textured, blurry, noise, cloudy, faint, and text)] was used, containing style-related keywords commonly employed in architectural visualization’s graphic style, such as photorealistic rendering, and in composition, the two-point perspective view. All the images were generated using the SD default model using a local PC (the local PC used for this study was equipped with an RTX series GPU from NVIDIA, and 16GB of RAM) with a resolution of 1024 × 512 pixels. More than 10 000 images were generated in this test (Table 2).

Table 2:

Resume of txt2img generation test result.

3.3. Demand of additional training

There are various methods and tools available to evaluate the fidelity of the generated images (⁠|$Im{g_{\rm G}}$|⁠) to their input text (⁠|${P_{\rm t}}$|⁠) in order to assess the performance of a default model (M). The human preference classifier (Wu et al., 2023a, b) and CLIP score (Hessel et al., 2022) are representative evaluation metrics for assessing the human preference score in txt2img synthesis. The first approach can measure the extent of misalignment with human preferences by identifying instances such as floating pillars, awkwardly positioned furniture, or discrepancies in appearance. The second approach measures the similarity between text prompts and images by assessing whether the images contain characteristic elements of an architect’s style. Additionally, qualitative methods such as surveys and observations can be used for assessment. This study is focused on introducing a novel visualization method while recognizing its subjective nature. Therefore, the evaluation was conducted both quantitatively and qualitatively based on prior research on each architects’ styles and features.

The quantitative evaluation of performance of M was conducted to assess the |$Similarity{\rm{\,\,}}$| between target images (⁠|$Im{g_{\rm t}}$|⁠) and those generated via AI (⁠|$Im{g_{\rm G}}$|⁠) based on CLIP score. The |$Similarity$| is calculated dividing CLIP score (⁠|$Score$|⁠) of each |$Im{g_{\rm G}}$| by the average CLIP score (⁠|$Average{\rm{\,\,}}\textit{Score}$|⁠) of the actual project images of each architect, designated as the targets in this study (Equation 8). If the |$Similarity( {Im{g_{\rm G}}} )$| reaches the target similarity (⁠|$Tsm$|⁠), the |$Im{g_{\rm G}}$| is classified as the target group (Equation 9). This evaluation was performed on a randomly selected sample of 100 |$Im{g_{\rm G}}{\rm{}}$| for each architect, with the |$Tsm$| set at 90%:

$$\begin{eqnarray} \textit{Similarity}\left( {Im{g_{{G_i}}}} \right) = \left( {\frac{{\textit{Score}\left( {Im{g_{{G_i}}}} \right)}}{{\textit{Average}\textit{Score}\left( {Im{g_{\rm t}}} \right)}}} \right) \times 100 \end{eqnarray}$$

(8)

$$\begin{eqnarray} \textit{Similarity}\left( {Im{g_{{G_i}}}} \right) \ge Tsm \Rightarrow Im{g_{{G_i}}} \in Im{g_{\rm t}}. \end{eqnarray}$$

(9)

The generated results were also qualitatively evaluated based on three criteria regarding how well they reflected the |${P_{\rm t}}$|⁠. With respect to the main description prompt, the evaluation focused on (i) style fidelity, determining how accurately the design characteristics of specific architects were represented, and (ii) domain fidelity, assessing whether the distinctive features of a particular building type, in this case, a residential house, were accurately reflected. Additionally, the photorealistic rendering prompt set, used across all the tests, was examined for (iii) image quality, assessing how closely the graphic style, composition, and resolution matched the desired output.

The results of these tests showed that the current SD model mostly achieved high domain fidelity and image quality. However, variations in style fidelity were observed between different architects, regardless of their prominence in the field. Figure 3 illustrates the proportion of images, among 100 sample images per architect, that showcased a |$Similarity$| of ≥90%. According to Fig. 3, for the eight architects, a majority of the |$Im{g_{\rm G}}$| showed a |$Similarity{\rm{\,\,}}$| of less than 90% compared with the |$Average{\rm{\,\,}}\textit{Score}{\rm{\,\,}}$| of the actual project images of each architect. For these architects, the generated images exhibit generic Western-style residential houses with relatively lower image quality and details (Table 2). To address the limited recognition of certain architects’ styles, additional training of the existing image generation model is required. Hence, we conducted some additional training by defining these architects’ design styles and features.

$Pretrained model’s performance for each architect’s style (percentage of $Im{g_{\rm G}}$ belonging to $Im{g_{\rm t}}$ within each sample).$

Figure 3:

Pretrained model’s performance for each architect’s style (percentage of |$Im{g_{\rm G}}$| belonging to |$Im{g_{\rm t}}$| within each sample).

4. Definition of Architects’ Styles

4.1. Operational definition of architects’ styles

According to Schapiro (1961), style comprises constant forms, elements, qualities, and expressions. These characteristics are used to distinguish differences between periods, groups, or individual designers (Ackerman, 1963; Chan, 1992; Crook, 1987; Smithies, 1981). Chan (1994) defined style as the set of common features present in artifacts, introducing a taxonomic approach to defining architectural styles in his study. Various qualitative and quantitative methods, including Chan’s, have been employed to define design styles in diverse fields (Huang et al., 2016). In industrial design, Hyun et al. (2015) quantified car styles, adapting Chan’s methodology.

Building on these studies, this research aims to define an architect’s style based on established concepts and use it to train and generate images. According to Chan (1994), style is composed of physical forms, patterns, or distinct characteristics. A style can be quantified by measuring the similarity between projects based on the repetition of common features across projects. A higher frequency of features contributes to a more coherent and strongly recognizable style, though certain features are more effective than others.

Drawing from the aforementioned concepts, in this section, an architect’s style can be defined as follows:

$$\begin{eqnarray} {S_{\rm A}} = \left\{ {\textit{form},\textit{materiality},\textit{structure}} \right\} \end{eqnarray}$$

(10)

$$\begin{eqnarray} {S_{\rm A}} \cdot W = \left\{ {\textit{form} \cdot W,\textit{materiality} \cdot W,\textit{structure} \cdot W} \right\}. \end{eqnarray}$$

(11)

Within the architect’s style (⁠|${S_{\rm A}}$|⁠), various visual features exist (Moussavi, 2015). However, this study places emphasis on form (⁠|$form$|⁠), materiality (⁠|$materiality$|⁠), and structure (⁠|$structure$|⁠) features. The |$form$| feature pertains to the formal characteristics of whether it is predominantly curved or straight (Ching, 2023); the |$materiality$| feature denotes a visually prominent aspect, encompassing the primary materials employed (Hartoonian, 2016); and the |$structure$| feature encompasses the connectivity of interior and exterior spaces, based on systems such as framing and load-bearing systems (Sandaker et al., 2022). Each style exhibits distinct degrees and measurements of features. When applying weight (W) to styles, W influences each feature.

Similar to Chan’s (1994) research, we focused on interpreting each architect’s style through features rather than the substance of styles. While the degree and the measurement of style are not extensively covered in the study, we can still control the intensity of the style, which is proportional to each defined elements, as shown in Equation (11). This phenomenon is visually illustrated in Fig. 4, specifically with its form.

Figure 4:

The txt2img generation applying Zaha Hadid style with different weight (W).

4.2. Fusion of architects’ styles

In this section, we employed the SD model to implement design fusions, integrating various architectural styles. The aim was to observe how each feature of a style, as defined in the previous section, influences other styles. To conduct style fusion, involving merging, extracting, and adjusting weight (W), we used the |$\mathrm{ SDP}$| established in Table 3. The supplementary |${P_{\rm t}}$|⁠, photorealistic rendering prompt set, and |$Para{m_{\rm G}}$| used for image generation were the same as used in previous tests. Based on the fusion results, we observed how each feature of a style and its associated weight influence and interact with other styles.

Table 3:

SDP for architects’ style fusion.

\|$\mathrm{ SDP}$\|	Positive prompt	Negative prompt
Merge (A + B)	Architect A and Architect B-inspired residential house	None
Extract (A − B)	Architect A-inspired residential house	Architect B’s design features
Weight (W)	Utilize (parentheses) for the words and place a colon and a number between 0 and 2 next to them, with 1 representing 100% effectiveness.

\|$\mathrm{ SDP}$\|	Positive prompt	Negative prompt
Merge (A + B)	Architect A and Architect B-inspired residential house	None
Extract (A − B)	Architect A-inspired residential house	Architect B’s design features
Weight (W)	Utilize (parentheses) for the words and place a colon and a number between 0 and 2 next to them, with 1 representing 100% effectiveness.

Table 3:

SDP for architects’ style fusion.

\|$\mathrm{ SDP}$\|	Positive prompt	Negative prompt
Merge (A + B)	Architect A and Architect B-inspired residential house	None
Extract (A − B)	Architect A-inspired residential house	Architect B’s design features
Weight (W)	Utilize (parentheses) for the words and place a colon and a number between 0 and 2 next to them, with 1 representing 100% effectiveness.

\|$\mathrm{ SDP}$\|	Positive prompt	Negative prompt
Merge (A + B)	Architect A and Architect B-inspired residential house	None
Extract (A − B)	Architect A-inspired residential house	Architect B’s design features
Weight (W)	Utilize (parentheses) for the words and place a colon and a number between 0 and 2 next to them, with 1 representing 100% effectiveness.

Through style fusion using image generation AI, we were able to identify the rough rules that govern how each feature interacts with others. When merging two or more styles, the features are combined in a visually harmonious way. Adjusting W to prioritize one style over the other resulted in the distinct traits of that style being prominently reflected. However, the feature extraction between styles worked only when there were similarities or overlapping features between them; otherwise, there was no visual impact.

As depicted in Fig. 5, the architectural styles of Louis Kahn and Antoni Gaudi are contrasting: Gaudi’s style showcases curvilinear shapes, employs a variety of colors and mosaics, and features a more closed structure; whereas, Kahn’s design style emphasizes on rectilinear shapes and predominantly employs concrete, resulting in an overall monochromatic appearance with more open elements like cloisters.

Figure 5:

The txt2img style fusion of different architects: mergence.

In the fusion of these contrasting styles, as shown in the lower section of Fig. 5, characteristics of both styles blend together depending on the value of W. While most results reflected Kahn’s monochromatic materiality and structure system, a greater application of Gaudi’s style highlighted one of his key features: organic and curvilinear forms. Conversely, when Kahn’s style was more pronounced, the curvature was restrained, leading to a more subdued expression. Through design fusions, it was observed that architects’ styles can be proportionally applied and can be visually distinguished. This process can also help achieve new design styles where the overall design features of both styles are harmoniously combined, by adjusting weights associated with either.

5. Additional Training of the Model for Architectural Visualization

5.1. Additional training of existing model

This study focuses on conducting additional training, particularly using the low rank-adaptation (LoRA) method (Hu et al., 2021), to generate images that belong to |$Im{g_{\rm t}}$|⁠. LoRA method involves reparameterizing the weight matrix used for updates by focusing on specific targets rather than updating the entire model’s weights. This approach is advantageous as it reduces computational costs and memory usage, while also remaining effective with smaller datasets. The LoRA model, is compact and offers the advantage of being efficiently swapped and utilized across multiple models.

If the majority of generated images (⁠|$Im{g_{\rm G}}$|⁠) do not belong to the target image (⁠|$Im{g_{\rm t}}$|⁠) group, the existing model (M) needs to be replaced with an alternative model (⁠|$M{\rm{^{\prime}}}$|⁠). In this study, the performance of the M is assessed by calculating the |$Similarity$| of |$Im{g_{\rm G}}$| to the |$Im{g_{\rm t}}$| based on CLIP scores. Accordingly, if the number (n) of randomly selected |$Im{g_{\rm G}}$| has a |$Similarity$| higher than the target similarity (⁠|$Tsm$|⁠) surpasses the threshold for the majority criterion (⁠|$\mu $|⁠), there is a need to replace M with |$M{\rm{^{\prime}}}$| as described in Equation (12). For all architects, |$Tsm$| is uniformly set at 90%, and |$\mu $| is set at 70% to evaluate the accuracy of the model’s output along with the consistency and stability of the results. To improve accuracy and stability, |$M{\rm{^{\prime}}}$| can be either substituted or upgraded through additional training.

The trained model for target (⁠|${M_{\rm t}}$|⁠) can be generated using “|$train( \,\, )$|” function, using the base model (M), hyperparameters (⁠|$Hyperparam$|⁠) to control the training process, and a training dataset specific to the target (⁠|${D_{\rm t}}$|⁠):

$$\begin{eqnarray} \textit{Most}\ of\ Im{g_{\rm G}} \notin Im{g_{\rm t}} \Rightarrow M^{\prime} \to M \end{eqnarray}$$

(12)

$$\begin{eqnarray} &&\left( {\frac{{\mathop \sum \nolimits_{i = 1}^n \left( {\textit{Similarity}\,\,\left( {Im{g_{{\mathrm{ G}_i}}}} \right) \ge Tsm} \right)}}{n}} \right) \times 100 < \mu \\ &&\quad \Rightarrow \,\,\textit{Most}\,\,of\,\,Im{g_\mathrm{ G}} \notin Im{g_\mathrm{ t}} \end{eqnarray}$$

(13)

$$\begin{eqnarray} \textit{train}\left( {M,\textit{Hyperparam},{D_{\rm t}}} \right) = {M_{\rm t}} \in M^{\prime}. \end{eqnarray}$$

(14)

Among these, |$Hyperparam$||$Hyperparam$| significantly influence the model’s learning process and the subsequent performance of |${M_{\rm t}}$|⁠. These |$Hyperparam$| involve diverse and extensive settings, with many detailed parameters. However, in this study, we focused on three key hyperparameters: train batch size (⁠|$B{S_{\rm t}}$|⁠), epochs (⁠|$epoch$|⁠), and learning rate (⁠|$\alpha $|⁠). |$B{S_{\rm t}}$| refers to the number of datasets processed together in each training iteration. |$epoch$| represents the number of iterations for one complete training of all datasets. |$\alpha $| determines the learning step size between iterations, controlling the speed and rate of errors and loss. These |$Hyperparam$| play a crucial role in shaping the training process and ultimately impact the effectiveness of |${M_{\rm t}}$|⁠:

$$\begin{eqnarray} \textit{Hyperparam} = \left\{ {B{S_{\rm t}},\textit{epoch},\alpha } \right\}. \end{eqnarray}$$

(15)

To accurately generate the target images, a systematic additional training method was proposed, as illustrated in Fig. 6 based on the previous definitions. The additional training process consists of two steps: (i) dataset preparation, which involved data collection, preprocessing, and keyword extraction, and (ii) model training. In this process, the prepared dataset was added to the base model using predefined hyperparameters. Through this process, we obtained the trained LoRA model, which has learned the target characteristics.

Figure 6:

Additional training process.

By applying this model (⁠|${M_{\rm t}}$|⁠) to the existing image generation function, images that closely resemble the |$Im{g_{\rm t}}$| with a higher similarity than before were obtained. When using the |${M_{\rm t}}$|⁠, it is necessary to input the application weight (WW), which should be a value between 0 and 1, representing 0 as 0% and 1 as 100%.

$$\begin{eqnarray} \textit{generate}\left( {M^{\prime} \vee M\left( {{M_{\rm t}},W} \right),\textit{Para}{m_{\rm G}},{P_{\rm t}}} \right) = Im{g_{{\rm G}^{\prime}}} \end{eqnarray}$$

(16)

$$\begin{eqnarray} \textit{generate}\left( {M^{\prime} \vee M\left( {{M_{\rm t}},W} \right),Im{g_{\rm S}},\textit{Para}{m_{\rm G}},\textit{Para}{m_{\rm P}},{P_{\rm t}}} \right) = Im{g_{{\rm G}^{\prime}}} \end{eqnarray}$$

(17)

$$\begin{eqnarray} \textit{Most}\ of\ Im{g_{{\rm G}^{\prime}}} \in Im{g_{\rm t}}. \end{eqnarray}$$

(18)

5.2. Data preparation for additional training

Few-shot learning requires high-quality training data with consistent content. For additional training using the LoRA method (Hu et al., 2021), a dataset (⁠|${D_{\rm t}}$|⁠) containing image data (⁠|$Im{g_{\rm D}}$|⁠) and corresponding annotation text data (⁠|$Tx{t_{\rm D}}$|⁠) is essential. |${D_{\rm t}}$| for additional training is defined as follows:

$$\begin{eqnarray} {D_{\rm t}} = \left\{ {Im{g_{\mathrm{ D}1}},Tx{t_{\mathrm{ D}1}}, \ldots ,Im{g_{\mathrm{ D}n}},Tx{t_{\mathrm{ D}n}}} \right\}. \end{eqnarray}$$

(19)

To ensure high-quality image data and content consistency between them, careful selection of images representing the target is crucial. The |$Im{g_{\rm D}}$| should align with the main description prompt, the desired composition, and the desired image quality. It is also important to avoid images that include excessive information, as it might interfere with the training process. Preprocessing steps such as image resizing and cropping help eliminate unnecessary content beforehand.

$$\begin{eqnarray} \textit{getannotation}\left( {Im{g_{\rm D}}} \right) = Tx{t_{\rm D}} \end{eqnarray}$$

(20)

$$\begin{eqnarray} Tx{t_{\rm D}} = \left\{ {N,SF,GF} \right\}. \end{eqnarray}$$

(21)

The text data, denoted as |$Tx{t_{\rm D}}$|⁠, is always trained in conjunction with the corresponding |$Im{g_{\rm D}}$|⁠. |$Tx{t_{\rm D}}$| is extracted from |$Im{g_{\rm D}}$| using the “getannotation()” operator, describing the target content and characteristics present within the |$Im{g_{\rm D}}$|⁠. To ensure a successful and efficient learning process, it is crucial that the |$Tx{t_{\rm D}}$| accurately and clearly describes the |$Im{g_{\rm D}}$|⁠, based on three components, as specified in Table 4. They include representation name (N), annotation of specific features (⁠|$SF$|⁠), along with the three features of style defined in Section 4.1, and general features (⁠|$GF$|⁠). Including abstract content in the |$Tx{t_{\rm D}}$| is beneficial; however, it is essential to include objective information that visually distinguishes and supports intangible aspects.

Table 4:

The composition of |$Tx{t_{\rm D}}$| for additional training.

Component	Description	Example for architect’s design style
Representative N	A pronoun or a word that activates the trained model. This component is essential.	Architect’s name, artist’s name, interior design style name, etc.
Annotation of \|$SF$\|	Specific tangible and abstract features that distinguish the target from others. These annotations may repeat throughout the training dataset.	Form, materiality, structure, architectural components, idea, theory, movement (e.g., modernism), emotion, etc.
Annotation of \|$GF$\|	Description of visual features about both the target and its context that do not belong to \|$SF$\|⁠. These annotations are general, can vary, and may not repeat.	Secondary materiality, a place where the project seems to be located, hour, presence of vegetation, etc.

Component	Description	Example for architect’s design style
Representative N	A pronoun or a word that activates the trained model. This component is essential.	Architect’s name, artist’s name, interior design style name, etc.
Annotation of \|$SF$\|	Specific tangible and abstract features that distinguish the target from others. These annotations may repeat throughout the training dataset.	Form, materiality, structure, architectural components, idea, theory, movement (e.g., modernism), emotion, etc.
Annotation of \|$GF$\|	Description of visual features about both the target and its context that do not belong to \|$SF$\|⁠. These annotations are general, can vary, and may not repeat.	Secondary materiality, a place where the project seems to be located, hour, presence of vegetation, etc.

Table 4:

The composition of |$Tx{t_{\rm D}}$| for additional training.

Component	Description	Example for architect’s design style
Representative N	A pronoun or a word that activates the trained model. This component is essential.	Architect’s name, artist’s name, interior design style name, etc.
Annotation of \|$SF$\|	Specific tangible and abstract features that distinguish the target from others. These annotations may repeat throughout the training dataset.	Form, materiality, structure, architectural components, idea, theory, movement (e.g., modernism), emotion, etc.
Annotation of \|$GF$\|	Description of visual features about both the target and its context that do not belong to \|$SF$\|⁠. These annotations are general, can vary, and may not repeat.	Secondary materiality, a place where the project seems to be located, hour, presence of vegetation, etc.

Component	Description	Example for architect’s design style
Representative N	A pronoun or a word that activates the trained model. This component is essential.	Architect’s name, artist’s name, interior design style name, etc.
Annotation of \|$SF$\|	Specific tangible and abstract features that distinguish the target from others. These annotations may repeat throughout the training dataset.	Form, materiality, structure, architectural components, idea, theory, movement (e.g., modernism), emotion, etc.
Annotation of \|$GF$\|	Description of visual features about both the target and its context that do not belong to \|$SF$\|⁠. These annotations are general, can vary, and may not repeat.	Secondary materiality, a place where the project seems to be located, hour, presence of vegetation, etc.

5.3. Additional training for architects’ styles and features

Additional training focusing on architects who experienced low or no similarity in the image generation test described in Section 3.2 was conducted. Implementation of few-shot learning using the previously defined additional training method and compared the performance of the default model (M) with the trained model (⁠|${M_{\rm t}}$|⁠) by generating images with each model were carried out. The images were generated using parameters (⁠|$Para{m_{\rm G}}$|⁠) and the prompts (⁠|${P_{\rm t}}$|⁠) as described in Section 3.2 and Equation (16).

The M generated images (⁠|$Im{g_{\rm G}}$|⁠) with an average |$Similarity$| of less than 90% compared with the target images (⁠|$Im{g_{\rm t}}$|⁠) for certain architects. So, when the |${M_{\rm t}}$| is not applied, the specific features of those styles were not represented. However, when the trained model was used, these features are correctly displayed proportional to the weights (W) assigned. As shown in Fig. 7, the average |$Similarity$| between |$Im{g_{\rm G}}$| and |$Im{g_{\rm t}}$|⁠, as well as the proportion of images with a |$Similarity$| of ≥90%, increased significantly after using |${M_{\rm t}}$|⁠. In the case of Louis Kahn, the |$Similarity$| level with the target improved by approximately 16% when the |${M_{\rm t}}$| was applied at 100%. The proportion of images belonging to the target group increased by approximately 3.4 times. After the additional training, the model was able to generate images applying architects’ styles and features along with combining them for style fusion.

Figure 7:

The txt2img generation test result with different weight of Louis Kahn style-trained model.

6. Demonstration

6.1. Overview of demonstration

Throughout this research, we observed that image generation AI can rapidly produce high-quality architectural visualization based solely on textual prompts. When applied in architecture, this technology allows architects to effortlessly generate design reference images and visualizations from the very initial stages of the design process. This section demonstrates the practical application of image generation AI, particularly SD, with various architects’ styles, focusing on different types of residential building visualization.

First, the image generation model improves its capabilities and spectral quality through additional training, extending to architects’ styles that the existing model may not recognize. Using txt2img generation, users can generate exterior visualization images of buildings with architects’ styles and features from text, building a reference database. This approach can be used to obtain architectural visualization images with a single architect’s style applied to building exteriors and it enables the combination or extraction of different styles to create new alternatives.

Using img2img generation to massing models produced during the initial design phases, we could instantly generate rendering images from various viewpoints. Thus, a user-friendly interface, where users can use this img2img technology more conveniently beyond text-based outputs, was demonstrated.

6.2. Additional training and architect’s style and feature model files

The implementation of design styles and features of various architects in image generation AI is described in this section. The image generation AI with additional training allows users to easily obtain desired images according to their needs even with a small dataset. This additional training was demonstrated based on the Equation (14), targeting architects with low similarity rates in the default model (M). Figure 8 presents the steps of additional training for the selected architects: (i) data preparation, including preprocessing and keyword extraction and (2) additional training of the dataset (⁠|${D_{\rm t}}$|⁠). By following this procedure, the additional training was aimed to enhance the model’s ability to generate users’ desired images that accurately reflect each architect’s distinctive features and characteristics.

Figure 8:

Additional training process (example of SANAA style). Developed from Fig. 6.

As shown in Fig. 8, the image data (⁠|$Im{g_{\rm D}}$|⁠) includes photographs of the projects from reputable sources such as the architects’ official websites and globally recognized architecture broadcasting platforms [e.g., Archdaily (2008), DIVISARE (1998), and Dezeen (2006)]. We aimed to include every project undertaken by the architects. To ensure high-quality training images, we selected representative photographs for each project based on two criteria: (i) the entire facades of the architectural structures, and (ii) a two- or one-point perspective. Additionally, we preprocessed the collected images to optimize the training process. This involved resizing the images and cropping out unnecessary elements in the surroundings that could potentially interfere in learning the target architect’s style and feature.

Existing interviews with architects and experts as well as precedent research on the architect’s style or their projects were employed to construct the text data (⁠|$Tx{t_{\rm D}}$|⁠) for each image. Each |$Tx{t_{\rm D}}$| for training the architects’ styles consisted of three categories of annotation (Table 5) based on Table 4 from Section 5.2. First, we appended “style” to the target architect’s name (e.g., SANAA style, Louis Kahn style, etc.) as the representative label for this additional training. Based on prior research and interviews, frequently used keywords related to the architect’s style, including its form, materiality, and structure, were selected. Additionally, visual features such as secondary materials, weather, and the surrounding environment were included, which are objectively observed. The generated |$Tx{t_{\rm D}}$| files were saved with the same names as the corresponding |$Im{g_{\rm D}}$| files and trained together as one |${D_{\rm t}}$|⁠.

Table 5:

|$Tx{t_{\rm D}}$| used for additional training of SANAA style.

Component	Reference source	Used image annotations
Representative N	SANAA (1995)	SANAA style
Annotation of \|$SF$\|	The extensive use of uniform skins is evident. White, homogeneous surfaces are often used. The use of poured concrete and other uniform materialities can be assimilated to white. Repeating densely small steel column or other structural elements, they transform the void into a porous solid (Vandenbulcke, 2012).	Minimalist, simplicity, elegance, sensitivity, transparency, translucency, openness, homogeneous, monolith, horizons, glass walls, curved shape, white color, fine steel columns (pilotis), thin ceilings, repetition, etc.
Annotation of \|$GF$\|	Based on observation	Bush, trees in the background, grass in the ground, in the park, sunny days, etc.

Component	Reference source	Used image annotations
Representative N	SANAA (1995)	SANAA style
Annotation of \|$SF$\|	The extensive use of uniform skins is evident. White, homogeneous surfaces are often used. The use of poured concrete and other uniform materialities can be assimilated to white. Repeating densely small steel column or other structural elements, they transform the void into a porous solid (Vandenbulcke, 2012).	Minimalist, simplicity, elegance, sensitivity, transparency, translucency, openness, homogeneous, monolith, horizons, glass walls, curved shape, white color, fine steel columns (pilotis), thin ceilings, repetition, etc.
Annotation of \|$GF$\|	Based on observation	Bush, trees in the background, grass in the ground, in the park, sunny days, etc.

Table 5:

|$Tx{t_{\rm D}}$| used for additional training of SANAA style.

Component	Reference source	Used image annotations
Representative N	SANAA (1995)	SANAA style
Annotation of \|$SF$\|	The extensive use of uniform skins is evident. White, homogeneous surfaces are often used. The use of poured concrete and other uniform materialities can be assimilated to white. Repeating densely small steel column or other structural elements, they transform the void into a porous solid (Vandenbulcke, 2012).	Minimalist, simplicity, elegance, sensitivity, transparency, translucency, openness, homogeneous, monolith, horizons, glass walls, curved shape, white color, fine steel columns (pilotis), thin ceilings, repetition, etc.
Annotation of \|$GF$\|	Based on observation	Bush, trees in the background, grass in the ground, in the park, sunny days, etc.

Component	Reference source	Used image annotations
Representative N	SANAA (1995)	SANAA style
Annotation of \|$SF$\|	The extensive use of uniform skins is evident. White, homogeneous surfaces are often used. The use of poured concrete and other uniform materialities can be assimilated to white. Repeating densely small steel column or other structural elements, they transform the void into a porous solid (Vandenbulcke, 2012).	Minimalist, simplicity, elegance, sensitivity, transparency, translucency, openness, homogeneous, monolith, horizons, glass walls, curved shape, white color, fine steel columns (pilotis), thin ceilings, repetition, etc.
Annotation of \|$GF$\|	Based on observation	Bush, trees in the background, grass in the ground, in the park, sunny days, etc.

The additional training was conducted with eight architects who exhibited considerably low similarity rates in the preliminary image generation test: SANAA, Renzo Piano, I.M. Pei, Le Corbusier, Shigeru Ban, Tadao Ando, Luis Barragan, and Louis Kahn. Relatively small |${D_{\rm t}}$| were created for each architect based on the aforementioned process, depending on the number of their real projects. The prepared |${D_{\rm t}}$| were added to the M, named pruned v1.5, using DreamBooth LoRA approach. The |${\rm{\,\,}}\textit{Hyperparam}$| used for training include a batch size of 1100 epochs, and a learning rate of 0.0001. The training duration ranged from 25 to 40 min on a local PC, proportional to the size of |${D_{\rm t}}$|⁠. Consequently, a single trained model (⁠|${M_{\rm t}}$|⁠) file with the safetensor extension was generated for each architect.

When the |${M_{\rm t}}$| file was applied to the M (Equation 16), it generated architectural exterior images that closely resembled the design styles of the architects, unlike when using the M. As demonstrated by the training example for the SANAA style in Table 6, it was possible to accurately depict the design features of architects and implement their specific styles by training on around 165 datasets within a short period. A total of eight |${M_{\rm t}}$| files were constructed, each implementing the styles of eight architects. All |${M_{\rm t}}$| files, added to the M, could generate high-quality images comparable with the output images shown in Section 6.3.

Table 6:

Resume of additional training results for SANAA style.

The performance of the |${M_{\rm t}}$| was evaluated by calculating the |$Similarity$| of generated images (⁠|$Im{g_{\rm G}}$|⁠) to their target. Figure 9 presents the |$Similarity$|-based performance evaluation results of |${M_{\rm t}}$| and M for the architects with low performance as described in Section 3.3. According to Fig. 9, the proportion of images with a similarity of ≥90% increased by ∼5 times on average after using |${M_{\rm t}}$|⁠, despite being generated with the same prompts and parameters. These results suggest that, under the same conditions, |${M_{\rm t}}$| can generate images that more accurately and effectively reflect the prompts, thereby enhancing the visualization process and quality.

Figure 9:

Trained model’s performance compared with pretrained model.

Additionally, a survey was conducted to qualitatively measure and validate the performance of |${M_{\rm t}}$| using the human evaluation score (HES). Each question in the survey presented one image generated by |${M_{\rm t}}$| and one by M, using the same prompts and parameters. Participants were asked to choose the one of the two images, the |$Im{g_{\rm G}}$| that better matched the description of the architect’s style provided (Fig. 10). The HES for each M was calculated as the percentage of the times the image generated by that model (⁠|${{\mathit{ Img}}}_{{{\rm{G}}_{{i}}}}^{{M}}$|⁠) was chosen divided by the total number of responses [number of participants (N) multiplied by the number of questions (Q)], as shown in Equation (22):

$$\begin{eqnarray} HES\left( M \right) = \left( {\frac{{\,\,\mathop \sum \nolimits_{i = 1}^n \textit{Selections}\,\,Img_{{\mathrm{ G}_i}}^M}}{{N \times Q}}} \right) \times 100{\rm{\,\,}}. \end{eqnarray}$$

(22)

Figure 10:

Question examples of the survey.

A survey consisting 80 multiple-choice questions (10 questions per architect) was conducted with 21 professionals in architectural design, focusing on the eight architects’ styles for which the models were additionally trained. The results of the survey, including individual HES for each architect and the total HES, are presented in Table 7. All eight |${M_{\rm t}}$| demonstrated higher HES compared with the M. On average, ∼94.88% of the images generated based on |${M_{\rm t}}$| (⁠|${{Img}}_{\rm{G}}^{{{{M}}_{\rm t}}}$|⁠) were selected because they better reflected the styles of each architect. This indicates that |${M_{\rm t}}$| can capture the nuances and subtleties of the architectural styles from the perspective of human perception and judgment.

Table 7:

HES results (%).

Category	I.M. Pei	Luis Barragan	Le Corbusier	Louis Kahn	Renzo Piano	Shigeru Ban	SANAA	Tadao Ando	Total
\|$HES( M )$\|	3.81	1.43	3.33	7.62	8.57	5.71	3.33	7.14	5.12
\|$HES( {{M_{\rm t}}} )$\|	96.19	98.57	96.67	92.38	91.43	94.29	96.67	92.86	94.88

Category	I.M. Pei	Luis Barragan	Le Corbusier	Louis Kahn	Renzo Piano	Shigeru Ban	SANAA	Tadao Ando	Total
\|$HES( M )$\|	3.81	1.43	3.33	7.62	8.57	5.71	3.33	7.14	5.12
\|$HES( {{M_{\rm t}}} )$\|	96.19	98.57	96.67	92.38	91.43	94.29	96.67	92.86	94.88

Table 7:

HES results (%).

Category	I.M. Pei	Luis Barragan	Le Corbusier	Louis Kahn	Renzo Piano	Shigeru Ban	SANAA	Tadao Ando	Total
\|$HES( M )$\|	3.81	1.43	3.33	7.62	8.57	5.71	3.33	7.14	5.12
\|$HES( {{M_{\rm t}}} )$\|	96.19	98.57	96.67	92.38	91.43	94.29	96.67	92.86	94.88

Category	I.M. Pei	Luis Barragan	Le Corbusier	Louis Kahn	Renzo Piano	Shigeru Ban	SANAA	Tadao Ando	Total
\|$HES( M )$\|	3.81	1.43	3.33	7.62	8.57	5.71	3.33	7.14	5.12
\|$HES( {{M_{\rm t}}} )$\|	96.19	98.57	96.67	92.38	91.43	94.29	96.67	92.86	94.88

6.3. Architects’ design styled image generation

This section outlines the acquisition of creative reference images which reflects architects’ design styles. Users can generate desired exterior images of buildings reflecting specific architects’ styles within a short timeframe using the image generation approach proposed in this research. Furthermore, image generation AI allows users to create and obtain a wider range of architectural image references by merging or extracting two or more architects’ styles.

Twenty internationally known architects, including recipients of the Pritzker Prize, often referred to as the Nobel Prize of Architecture, or those who have had a significant design influence were selected for this study. We generated architectural visualizations applying their styles based on the Equations (1) and (16). The pruned v1.5 checkpoint was used as the base M, and for architects with lower similarity, a trained model (⁠|${M_{\rm t}}$|⁠) from the previous section was additionally employed. Detailed text prompt (⁠|${P_{\rm t}}$|⁠) and generation parameter (⁠|$Para{m_{\rm G}}$|⁠) used for each architect’s style and their fusion are specified in the Tables 8 and 9. For style fusion, |${P_{\rm t}}$| was generated following the rules outlined in Section 4.2, as summarized in Table 3. Except for the content prompts, all other conditions were kept the same to compare the results of applying a single style and multiple styles. Each image took on average 5 second to be generated in the local PC environment.

Table 8:

Resume of the style fusion results: mergence.

Table 9:

Resume of the style fusion results: extraction.

Tables 8 and 9 provide evidence that the majority of these images accurately capture the architectural characteristics and elements associated with each architect’s style. Even in cases where the architect did not have prior experience with residential projects, the images maintained the scale and programmatic characteristics of residential buildings. When merging distinct styles, various features, such as form, materiality, and structure, are mitigated, diminished or enhanced. When Zaha Hadid’s style merges with Shigeru Ban’s style, Zaha Hadid’s curvilinear form was moderated and Shigeru Ban’s wooden grid shell was added and emphasized. In case of extracting from the two styles, the common characteristic between the two architects, concrete material, was removed, resulting in a residential house image with an entirely different metallic material. Louis Kahn’s distinctive form was retained, as it does not overlap with Tadao Ando’s style.

Visualizations of residential building exteriors were generated reflecting 20 singular styles with high-quality images that effectively captured the characteristics of all architects through |${M_{\rm t}}$|⁠. Furthermore, many style fusions were implemented, ultimately generating ∼11 crucial architectural styles by combining the styles of nine architects. These generated outputs (⁠|$Im{g_{\rm G}}$|⁠) can offer a range of ideas and inspirations even for the initial phases of architectural design, facilitating rapid and effective communication throughout the design process.

6.4. AIBIM-Design: AI-assisted rendering tool

In this section, we introduce AIBIM-Design, a user-friendly interface that effortlessly models building masses and generates images from these masses using the img2img method. With AIBIM-Design’s main interface, as depicted in Fig. 11, users can automatically model building mass alternatives according to their needs, such as floor area ratio, building regulations, number of floors, and total area, extracted from the input site plan. Additionally, the users have the flexibility to manually modify the model later or even draw and model the blueprint themselves from the start. Once the model is complete, the img2img rendering interface within the same platform allows users to create high-quality visualization images.

Figure 11:

Main interface of AIBIM-Design. (1) Drawing area with view control bar; (2) design and drawing tool palette; (3) properties pallet; and (4) spatial information browser.

The img2img rendering interface, shown in Fig. 12, enables real-time manipulation and exploration of various perspectives of the 3D model. Once users choose their desired view, they can easily generate visualizations from that perspective in seconds. This is achieved by selecting preferred architects’ styles and providing the details through prompts and parameters in the corresponding section. Users can experiment with alternative images multiple times until they attain the desired outcome. Once they achieve an image that closely aligns with their target, they can save the original image file.

$Img2img rendering interface in AIBIM-Design. (1) Trained model (${M_{\rm t}}$) options and prompt (${P_{\rm t}}$) input box; (2) parameter ($Para{m_{\rm G}}$ and $Para{m_{\rm P}}$) setting bar; (3) 3D model linked-seed image selection area; and (4) output preview area.$

Figure 12:

Img2img rendering interface in AIBIM-Design. (1) Trained model (⁠|${M_{\rm t}}$|⁠) options and prompt (⁠|${P_{\rm t}}$|⁠) input box; (2) parameter (⁠|$Para{m_{\rm G}}$| and |$Para{m_{\rm P}}$|⁠) setting bar; (3) 3D model linked-seed image selection area; and (4) output preview area.

The scenarios below describe the visualization of an architectural mass model using img2img method, especially through the AIBIM-Design renderer tool. Users input the image of the mass model as a seed image and the textual prompts (⁠|${P_{\rm t}}$|⁠) about their requirements and preferences and then visualizations are rendered. With this technique, architects can easily generate images, accelerating the process of decision making and communication.

Image generation was conducted based on Equations (2) and (17) using three seed images (⁠|$Im{g_{\rm S}}$|⁠) with different perspective, as shown in Fig. 13, by applying individual style of each architect and combining different architects’ styles. The results (Tables 10 and 11) revealed that, although there were limitations in implementing styles due to predetermined building volume, elements such as materials, structure, openings, and colors were successfully applied and were able to reflect corresponding styles. Using this interface with the img2img approach, architects can access more concrete design alternatives during the initial phase, helping them in making informed decisions and efficiently refine their designs.

Figure 13:

Seed images for img2img generation. (1) Frontal perspective; (2) angular perspective; and (3) isometric view. [Captured from 3D model linked-seed image selection area of Img2img rendering interface in AIBIM-Design (part (3) of Fig. 12)].

Table 10:

Resume of img2img style fusion results: mergence.

Table 11:

Resume of img2img style fusion results: extraction.

There are several limitations to this demonstration. This demonstration was conducted within the scope of specifying certain types and styles of buildings, showcasing 20 architect styles and residential buildings. The images were generated based on AI exhibit high-quality design alternatives for visualization. Moreover, not all generated images belong to the target category, and in some cases, alternative designs that are practically infeasible for construction and realization may be produced. During the additional training of architects’ styles, training data was extracted using as objective information as possible, and a majority of projects by architects were considered. Despite these efforts, there may be biases in the additional training and the fidelity of generated images in terms of style.

7. Conclusions

Visualization serves as a conduit for effective decision-making and communication. However, the process of visualization is difficult; it involves multiple intricate and sophisticated tasks. Driven by its significance and inherent complexities, this paper introduced a novel approach and a tool that leverages AI to create visual representations based on textual input. This approach involved additional training for styles with initially lower similarity rates, which required intensive data preparation and integration into the AI model. This technique has proven effective across multiple scenarios, significantly enhancing the efficiency and speed of architectural visualization image production. In this study, over 10 000 images were generated incorporating an architect’s personal style and characteristics into residential house models, to assess the base AI model’s effectiveness. The study highlights the vast potential of AI in design visualization, emphasizing a shift towards facilitating more user-centered and personalized design applications.

This research demonstrates how Gen AI can transform the architectural visualization process, making it more efficient and responsive to individual styles. The developed additional training process ensures that the AI model can effectively learn and replicate specific architectural styles, improving the relevance and quality of generated images. This approach allows for a broader range of visual representations, providing architects with powerful tools to explore and communicate their design ideas more effectively.

While our study shows promising results, it has limitations. The generated outputs are raster graphics images and do not include actual materials such as 3D model files. Not all generated images necessarily belong to the target category, and some designs may be impractical for construction. Additionally, biases in the supplementary training data may affect the fidelity of the generated images in terms of style.

Future research should focus on developing specialized training models based on more diverse and detailed variations in the training data for the enhancement of the model’s efficacy. Additionally, exploring other visualization forms by combining different AI models can lead to more systematic and multi-modal alternatives and representations, contributing to a more integrated and efficient design process.

Conflict of interest statement

The authors state that they do not have any known financial interests or personal relationships that could have influenced the findings of the study.

Author Contributions

Jin-Kook Lee (Conceptualization, Methodology, Visualization, Software, Project administration, Writing—original draft, Writing—review & editing, Youngjin Yoo (Investigation, Methodology, Visualization, Data curation, Software, Writing—original draft, Writing—review & editing), and Seung Hyun Cha (Visualization, Software, Investigation, Writing—review & editing)

Acknowledgments

This work is supported in 2024 by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant No. RS-2021-KA163269). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MIST) (Grant No. NRF-2022R1A2C1093310).

Data Availability

The first or corresponding author (PI) can provide most of the data for training, and/or models that were used in this study upon a reasonable request, as well as the links in references and the technical resource section. Additionally, the paper includes references to the archives and links provided by the PI. (Contact author: [email protected])

References

Ackerman

J. S.

(

1963

Style

. In

Ackerman

J. S.

Carpenter

(Eds.),

Art and archaeology

(pp.

174

–

186

.).

Prentice-Hall

Akin

(

1978

How do architects design

. In

J. C.

Latombe

(Ed.),

Artificial intelligence and pattern recognition in computer aided design

(pp.

–

104

.).

North Holland

https://doi.org/10.1080/106307301316904772

Al-Kodmany

(

2001

Visualization tools and methods for participatory planning and design

Journal of Urban Technology

–

https://www.archdaily.com/office/sanaa

Archdaily

. (

2008

SANAA

Accessed 12 November 2023.

Atilola

Tomko

Linsey

J. S.

(

2016

The effects of representation on idea generation and design fixation: A study comparing sketches and function trees

Design Studies

110

–

136

https://doi.org/10.1016/j.destud.2015.10.005

https://doi.org/10.1061/(ASCE)LM.1943-5630.0000127

Azhar

(

2011

Building information modeling (BIM): Trends, benefits, risks, and challenges for the AEC industry

Leadership and Management in Engineering

242

–

252

Bishop

C. M.

(

2006

Pattern recognition and machine learning (information science and statistics)

Springer New York, Inc

https://doi.org/10.1016/j.autcon.2004.08.012

Bouchlaghem

Shang

Whyte

Ganah

(

2005

Visualization in architecture, engineering and construction (AEC)

Automation in Construction

287

–

295

https://doi.org/10.1016/j.autcon.2021.103550

Castro

M. L.

Carballal

Rodríguez

Santos

Romero

(

2021

Artificial intelligence applied to conceptual design. A review of its use in architecture

Automation in Construction

124

103550

https://doi.org/10.1068/b190503

Chan

C. S.

(

1992

Exploring individual style in design

Environment and Planning B: Planning and Design

503

–

523

https://doi.org/10.1068/b210223

Chan

C. S.

(

1994

Operational definitions of style

Environment and Planning B: Planning and Design

223

–

246

Chen

(

2004

Architectural visualization: An analysis from human visual cognition process

Art & Design, Monash University

https://doi.org/10.1093/jcde/qwad091

Chen

Xue

Liu

Fang

Seo

J. O.

Kim

J. I.

(

2023

Effects of building information modeling prior knowledge on applying virtual reality in construction education: Lessons from a comparison study

Journal of Computational Design and Engineering

2036

–

2048

Ching

(

2023

Architecture: Form, space, and order

. (5th ed.).

John Wiley & Sons, Inc

Chiu

M. L.

(

1995

Collaborative design in CAAD studios: Shared ideas, resources, and representations

. In

Proceedings of International Conference on CAAD Future

(Vol.

, pp.

749

–

759

.).

Cho

Kim

Shin

Choi

Lee

J. K.

(

2020

Recognizing architectural objects in floor-plan drawings using deep-learning style-transfer algorithms

. In

Proceedings of the 25th International Conference on Computer-Aided Architectural Design Research in Asia, CAADRIA 2020

(pp.

719

–

727

.).

The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA)

Crook

J. M.

(

1987

The dilemma of style

The University of Chicago Press

https://doi.org/10.1007/978-3-030-84760-9_17

David

Joy

Kumar

Bezaleel

S. J.

(

2022

Integrating virtual reality with 3D modeling for interactive architectural visualization and photorealistic simulation: A direction for future smart construction design using a game engine

. In

J. I.Z.

Chen

J. M. R. S.

Tavares

A. M.

Iliyasu

K.L.

(Eds.),

Proceedings of the 2nd International Conference on Image Processing and Capsule Networks

(Vol.

300

Springer

Dezeen

. (

2006

SANAA

https://www.dezeen.com/tag/sanaa/ Accessed 18 November 2023

DIVISARE

. (

1998

SANAA / Kazuyo Sejima + Ryue Nishizawa

https://divisare.com/authors/10419-sanaa-kazuyo-sejima-ryue-nishizawa Accessed 15 November 2023

Eastman

C. M.

(

1999

Building product models: Computer environments, supporting design and construction

. (1st ed.).

CRC Press

https://doi.org/10.1126/science.adh4451

Epstein

Hertzmann

The Investigators of Human Creativity

. (

2023

Art and the science of generative AI

Science

380

1110

–

1111

Fonseca

Redondo

Valls

Villagrasa

(

2017

Technological adaptation of the student to the educational density of the course. A case study: 3D architectural visualization

Computers in Human Behavior

599

–

611

https://doi.org/10.1016/j.chb.2016.05.048

https://doi.org/10.1016/j.sbspro.2013.10.040

Fonseca

Villagrasa

Martí

Redondo

Sánchez

(

2013

Visualization methods in architecture education using 3D virtual models and augmented reality in mobile and social networks

Procedia-Social and Behavioral Sciences

1337

–

1343

Goetschalckx

Andonian

Oliva

Isola

(

2019

GANalyze: Toward visual definitions of cognitive image properties

. In

Proceedings of the IEEE/CVF International Conference on Computer Vision

(pp.

5744

–

5753

.).

IEEE

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

Bengio

(

2014

Generative adversarial nets

Advances in Neural Information Processing Systems 27

(Vol.

, pp.

2672

–

2680

.).

https://doi.org/10.1038/scientificamerican0574-98

Greenberg

D. P.

(

1974

Computer graphics in architecture

Scientific American

230

–

107

https://doi.org/10.1016/j.autcon.2022.104329

Han

Leite

(

2022

Generic extended reality and integrated development for visualization applications in architecture, engineering, and construction

Automation in Construction

140

104329

Hartoonian

(

2016

Materiality matters—if only for the look of it!

Löschke

S. K.

(Ed.),

Materiality and architecture

(pp.

–

.).

Routledge

Hessel

Holtzman

Forbes

Bras

R. L.

Choi

(

2022

CLIPScore: A reference-free evaluation metric for image captioning

arXiv:2104.08718

Jain

Abbeel

(

2020

Denoising diffusion probabilistic models

. In

Advances in Neural Information Processing Systems 33

(Vol.

, pp.

6840

–

6851

.).

https://doi.org/10.6106/KJCEM.2022.23.5.087

Hong

Koo

Won

(

2022

Evaluation of practical requirements for automated detailed design module of interior finishes in architectural building information model

Korean Journal of Construction Engineering and Management

–

https://doi.org/10.1016/j.jcde.2016.06.005

E. J.

Shen

Wallis

Allen-Zhu

Wang

Chen

(

2021

LoRA: Low-rank adaptation of large language models

arXiv preprint arXiv:2106.09685

Huang

Wang

Rosen

D. W.

(

2016

Material feature representation and identification with composite surfacelets

Journal of Computational Design and Engineering

370

–

384

https://doi.org/10.1016/j.aei.2015.04.001

Hyun

K. H.

Lee

J. H.

Kim

Cho

(

2015

Style synthesis and analysis of car designs for style quantification based on product appearance similarities

Advanced Engineering Informatics

483

–

494

https://doi.org/10.1007/s12525-021-00475-2

Janiesch

Zschech

Heinrich

(

2021

Machine learning and deep learning

Electron Markets

685

–

695

https://doi.org/10.1093/jcde/qwae017

Lee

J. K.

Lee

Y. C.

Choo

(

2024

Generative artificial intelligence and building design: Early photorealistic render visualization of façades using local identity-trained models

Journal of Computational Design and Engineering

–

105

https://doi.org/10.1093/jcde/qwae025

Kakooee

Dillenburger

(

2024

Reimagining space layout design through deep reinforcement learning

Journal of Computational Design and Engineering

–

https://doi.org/10.1016/j.jobe.2023.107105

Karras

Laine

Aila

(

2019

A style-based generator architecture for generative adversarial networks

. In

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

(pp.

4396

–

4405

.).

IEEE

Katsigiannis

Seyedzadeh

Agapiou

Ramzan

(

2023

Deep learning for crack detection on masonry façades using limited data and transfer learning

Journal of Building Engineering

107105

http://arxiv.org/abs/2210.09276.

Kawar

Zada

Lang

Tov

Chang

Dekel

Mosseri

Irani

(

2022

Imagic: Text-based real image editing with diffusion models

arXiv preprint

Kikuchi

Fukuda

Yabuki

(

2022

Future landscape visualization using a city digital twin: Integration of augmented reality and drones with implementation of 3D model-based occlusion handling

Journal of Computational Design and Engineering

837

–

856

https://doi.org/10.1093/jcde/qwac032

https://doi.org/10.1093/jcde/qwae041

Kim

Shin

Haseeb

Ur Rehman

(

2024

Client-centered detached modular housing: Natural language processing-enabled design recommender system

Journal of Computational Design and Engineering

137

–

157

https://doi.org/10.1007/s10055-023-00753-8

Kim

Song

Lee

J. K.

(

2019

Approach to auto-recognition of design elements for the intelligent management of interior pictures

. In

Proceedings of the 24th International Conference on Computer-Aided Architectural Design Research in Asia: Intelligent and Informed, CAADRIA

(pp.

785

–

794

.).

Kim

Park

Lee

Choo

(

2023

Reference-based image composition with sketch via structure-aware diffusion model

arXiv preprint arXiv:2304.09748

Korkut

E. H.

Surer

(

2023

Visualization in virtual reality: A systematic review

Virtual Reality

1447

–

1480

https://doi.org/10.1016/S0926-5805(99)00018-7

Koutamanis

(

2000

Digital architectural visualization

Automation in Construction

347

–

360

https://doi.org/10.1038/nature14539

Kunze

Burkhard

Gebhardt

Tuncer

(

2012

Visualization and decision support tools in Urban planning

. In

Arizona

S. M.

Aschwanden

Halatsch

Wonka

(Eds.),

Digital urban modeling and simulation. CCIS 242. Communications in computer and informatic science

(Vol.

242

, pp.

279

–

298

.).

Springer

LeCun

Bengio

Hinton

(

2015

Deep learning

Nature

521

436

–

444

Lee

J. K.

Jeong

Kim

Cha

S. H.

(

2024

Creating spatial visualizations using fine-tuned interior design style models informed by user preferences

Advanced Engineering Informatics

102686

https://doi.org/10.1016/j.aei.2024.102686

https://doi.org/10.1016/j.autcon.2012.03.002

Lee

J. K.

Lee

Jeong

Y. S.

Sheward

Sanguinetti

Abdelmohsen

Eastman

C. M.

(

2012

Development of space database for automated building design review systems

Automation in Construction

203

–

212

https://doi.org/10.1016/j.dibe.2023.100174

Lee

J. K.

Cho

Choi

Kim

Cha

S. H.

(

2023a

High-level implementable methods for automated building code compliance checking

Developments in the Built Environment

100174

https://doi.org/10.1093/jcde/qwad035

Lee

J. K.

Lee

Kim

Hong

S. W.

(

2023b

Augmented virtual reality and 360 spatial visualization for supporting user-engaged design

Journal of Computational Design and Engineering

1047

–

1059

https://doi.org/10.1109/TIFS.2017.2730822

Luo

Huang

(

2017

Localization of diffusion-based inpainting in digital images

IEEE Transactions on Information Forensics and Security

3050

–

3064

https://doi.org/10.1016/j.jobe.2023.107126

Lugmayr

Danelljan

Romero

Timofte

Van Gool

(

2022

Repaint: Inpainting using denoising diffusion probabilistic models

. In

Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

(pp.

11451

–

11461

.).

IEEE

J. H.

Erdogmus

Cha

S. H.

(

2023

Integration of a choice modeling approach with immersive virtual environments for accurate space utilization prediction

Journal of Building Engineering

107126

https://doi.org/10.1007/978-981-15-3383-9_54

Mathew

Amudha

Sivakumari

(

2021

Deep learning techniques: An overview

. In

Hassanien

Bhatnagar

Darwish

(Eds.),

Advanced machine learning technologies and applications. AMLTA 2020. Advances in intelligent systems and computing

(Vol.

1141

, pp.

599

–

608

.).

Springer

Meng

Song

Zhu

J. Y.

Ermon

(

2021

SDEdit: Guided image synthesis and editing with stochastic differential equations

arXiv:2108.01073

Midjourney Inc.

(

2022

Midjourney (version 4) (text-to-image model)

https://www.midjourney.com/ Accessed 9 October 2023

Moussavi

(

2015

The function of style

Harvard University Graduate School of Design

OpenAI

(

2021

GPT-3 (large language model)

https://github.com/openai/gpt-3 Accessed 22 August 2023

OpenAI

(

2022

DALL·E 2 (text-to-image model)

https://openai.com/dall-e-2 Accessed 4 September 2023

OpenAI

Pilipiszyn

(

2021

GPT-3 powers the next generation of apps

https://openai.com/blog/gpt-3-apps Accessed 25 August 2023

https://doi.org/10.1145/3569219.3569352

Oppenlaender

(

2022

The creativity of text-to-image generation

. In

Proceedings of the 25th International Academic Mindtrek Conference

(pp.

192

–

202

.).

Park

B. H.

Hyun

K. H.

(

2022

Analysis of pairing of colors and materials of furnishings in interior design with a data-driven framework

Journal of Computational Design and Engineering

2419

–

2438

https://doi.org/10.1093/jcde/qwac114

https://doi.org/10.6106/KJCEM.2023.24.5.035

Park

Cha

(

2023

A developing a machine leaning-based defect data management system for Multi-Family housing unit

Korean Journal of Construction Engineering and Management

–

https://doi.org/10.1007/s43503-023-00018-y

Ploennigs

Berger

(

2023

AI art in architecture

AI in Civil Engineering

https://doi.org/10.1093/jcde/qwad093

Qian

Luo

Sha

Asghar

Chen

(

2023

Multi-threshold remote sensing image segmentation with improved ant colony optimizer with salp foraging

Journal of Computational Design and Engineering

2200

–

2221

https://doi.org/10.1016/j.jobe.2021.103822

Rahbar

Mahdavinejad

Markazi

Bemanian

(

2022

Architectural layout design through deep learning and agent-based modeling: A hybrid approach

Journal of Building Engineering

103822

Ramesh

Pavlov

Goh

Gray

Voss

Radford

Chen

Sutskever

(

2021

Zero-shot text-to-image generation

. In

Procedings of the International Conference on Machine Learning

(pp.

8821

–

8831

.).

PMLR

Rombach

Blattmann

Lorenz

Esser

Ommer

(

2022

High-resolution image synthesis with latent diffusion models

. In

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

(pp.

10684

–

10695

.).

IEEE

Saharia

Chan

Chang

Lee

Salimans

Fleet

Norouzi

(

2022a

Palette: Image-to-image diffusion models

. In

SIGGRAPH’22: ACM SIGGRAPH 2022 Conference Proceedings

(pp.

–

.).

Association for Computing Machinery

Saharia

Chan

Saxena

Whang

Denton

Kamyar Seyed Ghasemipour

Karagol Ayan

Mahdavi

S. S.

Gontijo Lopes

Salimans

Fleet

D. J.

Norouzi

(

2022b

Photorealistic text-to-image diffusion models with deep language understanding

. In

Advances in Neural Information Processing Systems 35

(Vol.

, pp.

36479

–

36494

.).

https://doi.org/10.1007/s44150-022-00034-z

SANAA

. (

1995

Kazuyo Sejima + Ryue Nishizawa / SANAA

http://www.sanaa.co.jp/ Accessed 14 October 2023

Sandaker

B. N.

Kleven

Wang

A. R.

(

2022

Structural typologies and the architectural space—Studies of the relationship between structure and space by application of structural types to multistory buildings

Architecture, Structures and Construction

199

–

221

Schapiro

(

1961

Style

Aesthetics Today

Smithies

K. W.

(

1981

Principles of design in architecture

Van Nostrand Reinhold

Sohl-Dickstein

Weiss

Maheswaranathan

Ganguli

(

2015

Deep unsupervised learning using nonequilibrium thermodynamics

. In

Proceedings of the 32nd International Conference on Machine Learning

(pp.

2256

–

2265

.).

PMLR

https://doi.org/10.1093/jcde/qwaa046

Song

Lee

J. K.

Choi

Kim

(

2020a

Deep learning-based extraction of predicate-argument structure (PAS) in building design rule sentences

Journal of Computational Design and Engineering

563

–

576

Song

Meng

Ermon

(

2020b

Denoising diffusion implicit models

arXiv preprint arXiv:2010.02502

Stability AI

. (

2022

Stable diffusion (version 1.5) (text-to-image model)

https://stability.ai/blog/stable-diffusion-public-release Accessed 12 September 2023

Vandenbulcke

(

2012

Concretion, abstraction: The place of design processes in today architecture practice. Case study: SANAA

. In

Proceedings of the 1st International Conference on Architecture and Urban Design

(pp.

–

.).

https://doi.org/10.1016/j.jobe.2022.104715

Wei

Tien

P. W.

Chow

T. W.

Calautit

J. K.

(

2022

Deep learning and computer vision based occupancy CO2 level prediction for demand-controlled ventilation (DCV)

Journal of Building Engineering

104715

https://doi.org/10.48550/arXiv.2303.14420

Sun

Zhu

Zhao

(

2023a

Human preference score: Better aligning text-to-image models with human preference

Hao

Sun

Chen

Zhu

Zhao

(

2023b

Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis

https://doi.org/10.48550/arXiv.2306.09341

Zhang

Teng

Pan

(

2023

Integrating IoT and BIM for tracking and visualising embodied carbon of prefabricated buildings

Building and Environment

242

110492

https://doi.org/10.1016/J.BUILDENV.2023.110492

https://doi.org/10.1007/s11042-016-4104-9

Zhang

(

2016

3D visualization for building information models based upon IFC and WebGL integration

Multimedia Tools and Applications

17421

–

17441

https://doi.org/10.1016/j.autcon.2010.11.013

Yan

Culp

Graf

(

2011

Integrating BIM and gaming for real-time interactive architectural visualization

Automation in Construction

446

–

458

https://doi.org/10.1016/j.sbspro.2012.08.120

Yildirim

Yavuz

A. O.

(

2012

Comparison of traditional and digital visualization technologies in architectural design education

Social and Behavioral Sciences

–

https://doi.org/10.1093/jcde/qwac071

Zhang

Hao

Liang

Liu

Qin

(

2022

A novel deep convolutional neural network algorithm for surface defect detection

Journal of Computational Design and Engineering

1616

–

1632