Table 2. Open in new tab Summary of...

Early fusion

Intermediate fusion

Late fusion

Description

Features from all modalities are merged with no distinction of which features come from which modality

Every modality is processed separately by its own sub-model and the individual outcomes are combined to get a single prediction

Pros

Use of cross-modality correlations and interactions
They have lower computational complexity compared to other fusion strategies because the fusion occurs at the input level

Effectively balances the use of cross-modality and within-modality correlations and interactions, optimizing the number of parameters required.
They have moderate to high computational complexity depending on the complexity of the fusion mechanism employed
Robustness to missing modalities
Flexibility to choose the level where specific modalities are fused
Compared to early fusion, intermediate fusion may be more efficient in terms of capturing interactions between modalities while avoiding excessively high-dimensional input spaces
More robust than early fusion to noisy or incomplete data due to fusion at intermediate layers

Easy computational implementation
They have relatively lower computational complexity
More robust to noisy or incomplete data than early fusion as each modality is processed independently

Cons

High computational cost due to a high number of connections
Risk of learning fake cross-modality correlations
High number of parameters and neural connections
Less robust to noisy or incomplete data due to direct combination at the input level

Risk of loss of information from cross-modality correlations

Loss of information from potential interactions and cross-modality correlations
Late fusion may require more training time compared to early fusion due to the separate processing of each modality

Table 2.

Open in new tab

Summary of advantages and disadvantages of different data fusion strategies.

	Early fusion	Intermediate fusion	Late fusion
Description	Features from all modalities are merged with no distinction of which features come from which modality		Every modality is processed separately by its own sub-model and the individual outcomes are combined to get a single prediction
Pros	Use of cross-modality correlations and interactions They have lower computational complexity compared to other fusion strategies because the fusion occurs at the input level	Effectively balances the use of cross-modality and within-modality correlations and interactions, optimizing the number of parameters required. They have moderate to high computational complexity depending on the complexity of the fusion mechanism employed Robustness to missing modalities Flexibility to choose the level where specific modalities are fused Compared to early fusion, intermediate fusion may be more efficient in terms of capturing interactions between modalities while avoiding excessively high-dimensional input spaces More robust than early fusion to noisy or incomplete data due to fusion at intermediate layers	Easy computational implementation They have relatively lower computational complexity More robust to noisy or incomplete data than early fusion as each modality is processed independently
Cons	High computational cost due to a high number of connections Risk of learning fake cross-modality correlations High number of parameters and neural connections Less robust to noisy or incomplete data due to direct combination at the input level	Risk of loss of information from cross-modality correlations	Loss of information from potential interactions and cross-modality correlations Late fusion may require more training time compared to early fusion due to the separate processing of each modality

	Early fusion	Intermediate fusion	Late fusion
Description	Features from all modalities are merged with no distinction of which features come from which modality		Every modality is processed separately by its own sub-model and the individual outcomes are combined to get a single prediction
Pros	Use of cross-modality correlations and interactions They have lower computational complexity compared to other fusion strategies because the fusion occurs at the input level	Effectively balances the use of cross-modality and within-modality correlations and interactions, optimizing the number of parameters required. They have moderate to high computational complexity depending on the complexity of the fusion mechanism employed Robustness to missing modalities Flexibility to choose the level where specific modalities are fused Compared to early fusion, intermediate fusion may be more efficient in terms of capturing interactions between modalities while avoiding excessively high-dimensional input spaces More robust than early fusion to noisy or incomplete data due to fusion at intermediate layers	Easy computational implementation They have relatively lower computational complexity More robust to noisy or incomplete data than early fusion as each modality is processed independently
Cons	High computational cost due to a high number of connections Risk of learning fake cross-modality correlations High number of parameters and neural connections Less robust to noisy or incomplete data due to direct combination at the input level	Risk of loss of information from cross-modality correlations	Loss of information from potential interactions and cross-modality correlations Late fusion may require more training time compared to early fusion due to the separate processing of each modality

This Feature Is Available To Subscribers Only