Mind Captioning

This study introduces a novel generative decoding method called mind captioning, which generates descriptive text mirroring semantic information represented in the brain. The method combines feature decoding analysis from brain activity—using semantic features computed by a deep language model—with a novel text optimization method that iteratively updates candidate descriptions to align their features to the decoded features. This approach produces meaningful linguistic expressions of mental content—both during visual perception and internal recall—without relying on conventional language-processing regions. It offers a path toward non-verbal thought-based brain-to-text communication for individuals with language production difficulties.

Examples of evolved descriptions of viewed content during the optimization process.

Abstract

A central challenge in neuroscience is decoding brain activity to uncover mental content comprising multiple components and their interactions. Despite progress in decoding language-related information from human brain activity, generating comprehensive descriptions of complex mental content associated with structured visual semantics remains challenging. We present a method that generates descriptive text mirroring brain representations via semantic features computed by a deep language model. Constructing linear decoding models to translate brain activity induced by videos into semantic features of corresponding captions, we optimized candidate descriptions by aligning their features with brain-decoded features through word replacement and interpolation. This process yielded well-structured descriptions that accurately capture viewed content, even without relying on the canonical language network. The method also generalized to verbalize recalled content, functioning as an interpretive interface between mental representations and text, and simultaneously demonstrating the potential for non-verbal thought-based brain-to-text communication, which could provide an alternative communication pathway for individuals with language expression difficulties, such as aphasia.

Results

Viewed content captions (all subjects)

Recalled content captions (all subjects)

Our method generates not only descriptions of viewed content but also descriptions of recalled content from brain activity, enabling verbalization of internally imagined experiences.

Q&A

Q1: What is Mind Captioning?

A1: Mind Captioning is a generative decoding method that translates brain activity into descriptive text. It uses deep language models to compute semantic features from candidate texts and then iteratively refines these descriptions to align them with the semantic features decoded from the brain.

Q2: How does Mind Captioning work?

A2: The method decodes (translates) brain activity patterns related to visual stimuli and internal recall into semantic features, refining text descriptions by aligning their semantic features with those decoded from the brain. A key aspect of this process is the generative optimization algorithm, which allows flexible text generation without requiring additional model training.

Q3: What did you find using this method?

A3: We found that Mind Captioning can generate accurate text for both perceived and recalled content, suggesting the potential for non-verbal, thought-based communication.

Q4: Is this a language decoding or reconstruction system?

A4: No, it’s not language decoding. Our method identifies the optimal linguistic description based on non-verbal brain activity, positioning it as an interpretive interface rather than a decoding system. Several factors support this view: (1) descriptions are generated without involving the language network, (2) although the output is in English, all participants were non-native speakers, and (3) the decoders were trained using non-linguistic visual stimuli.

Q5: Can I easily test this using my own brain activity? Is there any risk of my brain activity being read?

A5: At present, testing requires specialized equipment (fMRI) and significant cooperative participation over many hours. While there are no immediate risks of unauthorized reading of brain activity in casual circumstances, the ability to decode recalled content into text raises important privacy concerns. This technology should be used with caution, particularly when it comes to potential risks associated with decoding private thoughts. However, it is important to note that this method is not yet at a stage where it can be easily applied in everyday situations or personal testing.

Q6: Does this reflect the subject's subjective experience?

A6: To some extent, yes. The generated text tends to align with each subject’s own perception of the viewed content, as suggested by its similarity to captions that participants rated as more consistent with their experience. However, further validation is needed to determine how closely the generated descriptions truly correspond to what each subject actually perceived through self-assessment.

Q7: Can we expect further improvements in accuracy?

A7: Yes, further improvements are expected. The study suggests that using language models with semantic representations more closely aligned with the brain will likely improve accuracy, as shown by the validation within the paper.

Q8: What are the potential applications?

A8: A key application is facilitating communication for individuals with language production difficulties, such as those with aphasia. This method, which does not rely on language areas in the brain, may assist those with language damage (e.g., aphasia, ALS) by providing an alternative communication path. It could also help in verbalizing internally recalled content, offering insights into mental states.

Q9: Can this be applied to other sensory modalities, like sound or touch?

A9: Yes, this method would be adaptable to other sensory modalities. As long as there is a labeled brain activity dataset for the modality, the method can be applied to sensory information like sound or touch, without needing separate models or large curated databases for each type.

Q10: Does this help in understanding the brain?

A10: Yes, the method helps uncover how structured visual semantic information is represented in the brain. However, model and data biases could influence the results. Further tests in contrasting conditions are needed to validate if the brain truly represents compositional information.

Q11: Could this method be applied to dream decoding?

A11: While the method shows potential for dream decoding, further research is needed. This study focused on voluntary, recall-based imagery, so additional testing would be required to determine its applicability to spontaneous imagery, such as dreams or mind-wandering.

BibTeX Citation

@article{Horikawa2025sciadv,
  author = {Tomoyasu Horikawa},
  title = {Mind captioning: Evolving descriptive text of mental content from human brain activity},
  journal = {Science Advances},
  volume = {11},
  number = {45},
  pages = {eadw1464},
  year = {2025},
  doi = {10.1126/sciadv.adw1464},
  URL = {https://www.science.org/doi/10.1126/sciadv.adw1464}
}