fMRI image reconstruction.
link to final writeup
link to inspiration paper
link to github
Kyle Jung: kjung7
Anoop Reddi: areddi
Alexander Halpin: ahalpin
Introduction: What problem are you trying to solve and why? We are trying to reconstruct images from FMRI data by implementing a diffusion model. We aim to improve upon the image quality of current fMRI image reconstruction methodology that uses a GAN architecture. Our interest in the project stems from our interests in diffusion models and their application to relevant problems. This is a generative modeling problem. We are trying to reconstruct an image from a latent space representation.
Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? Images and perception are thought to be encoded in the brain in hierarchical representations. Though there have been several attempts to visualize the mental contents, attempts to use the multiple hierarchical representations have been unsatisfactory. This paper shows a novel method using a GANs model for visual image reconstruction from the brain. These reconstructed images were successfully produced and were shown to be similar to the artificial images processed by the brain through the analysis of human fMRI patterns. This shows that this method can combine multiple hierarchical representations to construct perceptions and images subjective to the person, allowing us to better understand what humans are “seeing” inside of our brains.
Data: What data are you using (if any)? We are using fMRI data provided by the authors of the original paper that we drew inspiration from. The fMRI scans were collected from experiments conducted on 3 subjects across multiple sessions at Kokoro Research Center, Kyoto University. Scans were recorded while the subjects were viewing the images as well as afterwards when they were asked to visualize the images from memory. The data consists of fMRI data from the following experiments: 1200 natural images x 5 50 test natural images x 24 40 artificial shape images x 20 10 alphabet-letter sessions x 12 The original paper also includes pre-extracted features from the fMRI data. The features were extracted by a VGG19 model trained on imagenet. The features were extracted from each layer of the VGG19 model and were left separated to mimic the brain's hierarchical feature extraction in the training of the generative model.
Methodology: Our novel contributions to the experiment include a variational autoencoder for feature extraction and encoding the fMRI data into a compressed latent space. This latent representation of the fMRI data will then be used to train a latent diffusion model. The current methodology uses a GAN for generating the images which is trained on the comparison of features extracted from the fMRI data and the previously generated iteration of the image. This method uses the same VGG19 model (trained on Imagenet) for extracting the features from both the fMRI data and the generated image. We believe that this method does not account for unique features that may be present in the fMRI data and can be improved upon with our encoding approach. Diffusion models provide certain possible advantages over GANs including better diversity across the distribution and easier training. https://arxiv.org/pdf/2105.05233.pdf
Metrics: What experiments do you plan to run? We first plan on analyzing the data and results of the paper. Then, we plan on implementing a diffusion model using the same data to create potentially higher quality reconstructions. For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? The notion of accuracy does not entirely apply to our project. The research group using a GAN architecture used an objective and subjective method to assess their model. The objective model compares the Pearson pixel correlation of the reconstructed image with the original image and a randomly selected image. Similarly, the subjective involved selecting 13 participants as raters to select the most similar image to the reconstructed image, where the two choices were the original image and a randomly selected image. If you are doing something new, explain how you will assess your model’s performance. We plan on qualitatively comparing the images constructed using our diffusion models to the paper. What are your base, target, and stretch goals? Our base goal is to provide in-depth analysis on the paper and improvements for their implementation of their presented GAN model. Our target goal is to get a diffusion model working for our application. Our reach goal is to beat the paper’s GANs model with our implementation of the Diffusion model.
Ethics: What broader societal issues are relevant to your chosen problem space? The broader social issue relevant to brain scan reconstruction is privacy. The ability to scan, analyze, and extract information from someone without consent can cause multiple privacy concerns and an opening for malicious actors. Additionally, there would exist ethical concerns for the application of this type of technology for intelligence gathering activities by governments for policing and national security such as custody questioning and interrogation. Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The current stakeholders are the medical research community who are trying to understand how the brain works. Mistakes by these groups may result in ineffective development of treatment to disease and incomplete understanding of the brain. However, future stakeholders may include governments and companies. Algorithmic mistakes for government implementation may result in unfair application of law while company applications may influence unfair economic distributions. What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? There are no current concerns with this small scale collection of this type of data for research purposes. However, if these types of models are scaled, the genetic, social, and economic dispositions of participants may influence the effectiveness of analyzing brain waves for certain populations in the future.
Division of labor: Briefly outline who will be responsible for which part(s) of the project. Alexander Halpin: Constructing a feature extractor/encoder for transforming the fMRI data into the latent space used to train the generative model. Anoop Reddi: Reimplementation of code from original experiment and data wrangling. Kyle Jung: exploration, implementation, and training of diffusion model.
CHECK IN 2 For this checkin, we also require you to write up a reflection including the following:
Introduction: We were interested in diffusion models and processing fMRI data. Our interests stem from generative modeling and brain signals. We were trying to reconstruct images from fMRI data from a series of experiments. Scientists performed experiments where they asked participants to look at an image. Data was then collected from the participant’s visual cortex and labeled according to the image they saw. The subsequent computational work attempted to reconstruct those images using a GAN. Most of the reconstructions were blurry and indistinguishable from a random image. We hoped to improve on their implementation using a latent diffusion model conditioned on fMRI data. This task proved to be much more ambitious and computationally expensive than we anticipated. We simplified the reconstruction part of the project. We are pursuing the less computationally expensive task of generating captions from fMRI data. We are interested in how the rich latent space translates to word labels.
Challenges: What has been the hardest part of the project you’ve encountered so far? The most difficult part of our project so far has been working with the fMRI data. The first component of our architecture is an autoencoder that projects the fMRI data into a lower dimensional space. Training this autoencoder is challenging because it is difficult to know if the data is being reconstructed in a meaningful way. The original fMRI data appears to a human as very noisy so an accurate reconstruction is not immediately recognizable like an image would be. Implementing a diffusion model has been a challenging task that requires a lot of computational resources and data that we do not have.
Insights: Are there any concrete results you can show at this point? We have trained a VAE on the fMRI data. This projection will be used to condition our reconstruction model. Plan: Are you on track with your project? What do you need to dedicate more time to? We are making progress. It would be nice to have a full prototype going at the moment but there have been many little hurdles to jump over. What are you thinking of changing, if anything? We initially wanted to condition a diffusion model to generate images from fMRI data. However, diffusion models have proven to be too computationally expensive for our project. We decided to pivot to a transformer based model.
Log in or sign up for Devpost to join the conversation.