The model is a spectral super-resolution network with an Encoder-Transformer-Decoder architecture, consisting of three information streams: an RGB main branch, an auxiliary hyperspectral (HSI) prior branch, and a feature fusion and reconstruction branch. The input end includes two inputs: 1) A low-dimensional RGB image as the main input. 2) An auxiliary hyperspectral image with spatial mismatch, used only to provide spectral prior information. The **RGB Encoder** consists of several convolutional layers and residual blocks, used to extract low-level and mid-level spatial features from the RGB image, and outputs an intermediate feature tensor while maintaining high spatial resolution. The auxiliary **HSI Prior Encoder** branch first extracts features from the auxiliary hyperspectral image, and then decomposes the three-dimensional hyperspectral features into a set of one-dimensional spectral basis vectors using a CP-based Low-Rank Decomposition module to represent the global spectral prior. This branch does not retain spatial location information and only outputs a low-rank spectral representation. Subsequently, the spectral prior is fed into multiple **Adaptive Low-Rank Projection Layers**. Each projection layer maps the low-rank spectral basis vectors to a feature space consistent with the RGB feature channels, and modulates the RGB features through attention weights to achieve feature-level spectral guidance rather than pixel-level fusion. The Transformer/Attention module is located in the middle of the network and is used to model long-range dependencies on the fused features. This module can include multi-dimensional self-attention mechanisms that act on the spatial and spectral/channel dimensions to enhance global context modeling capabilities. The **Decoder** consists of several convolutional layers or feed-forward networks (FFN) and is used to gradually map the fused features to the hyperspectral image space, eventually outputting a high-resolution hyperspectral image. The decoding stage can include residual connections to directly add shallow RGB features or input mappings to the output to stabilize training. The key features of the entire network are: The auxiliary HSI does not participate in spatial alignment and only provides global spectral priors through low-rank decomposition. RGB features undertake spatial structure modeling. The spectral prior is injected into the backbone network through adaptive low-rank projection and attention mechanisms. Finally, spectral super-resolution reconstruction is achieved without spatial registration.
A2' Warm Color (Medium-High Risk): Suitable for medium to hi...