While snapshot hyperspectral cameras are cheaper and faster than imagers based on pushbroom or whiskbroom spatial scanning, the output imagery from a snapshot camera typically has different spectral bands mapped to different spatial locations in a mosaic pattern, requiring a demosaicing process to be applied to generate the desired hyperspectral image with full spatial and spectral resolution. However, many existing demosaicing algorithms suffer common artifacts such as periodic striping or other forms of noise. To ameliorate these issues, a hyperspectral demosaicing framework that couples a preliminary demosaicing network with a separate multi-stage progressive denoising network is proposed, with both networks employing transformer and attention mechanisms. A multi-term loss function permits supervised network training to monitor not only performance of the preliminary demosaicing but also denoising at each stage. An extensive collection of experimental results demonstrate that the proposed approach produces demosaiced images with not only fewer visual artifacts but also improved performance with respect to several quantitative measures as compared to other state-of-the-art demosaicing methods from recent literature.