12 Comments

Really interesting post. One thing I am worried about is that these give us the illusion of progress in neuroscience, but don't lead us to a mechanistic understanding of the brain. I'll admit it is a personal professional bias of mine. I guess I am one of those hypothesis-driven-minded scientists. In a way I wonder if the criticisms that were leveraged years ago against the BBP would not work here too.

Expand full comment
author

BBP: Bayesian Brain?

I think to the contrary, the work on mechanistic interpretability of large scale models, which is far less bottlenecked by tools, will tell us a lot about to how to do causal mechanism inference in neuro. I have a draft post on the mechanism issue based on this paper by Grace Lindsay and David Bau, which I will publish later this week or the next.

https://www.sciencedirect.com/science/article/pii/S1389041723000906

Expand full comment

Blue Brain Project sry.

Expand full comment
author

Ah! Re: BBP, you have to test your models early and often and set up clear kill criteria for projects. I've discussed this earlier: https://www.neuroai.science/p/connectomics-behavioural-cloning

Expand full comment

Which path do you find more promising, the basic model or the algorithm based on neural manifold geometry?And what's the future direction of neural foundation model?

Expand full comment

I did not see anything additional done in the Zhang Universal Translator paper that Azabou didn't already do in the POYO paper, other than billing it is a foundational model. Do you see anything new done by Zhang?

Expand full comment
author

The POYO paper relies on a supervised task exclusively, predicting motor output. The Zhang paper uses an unsupervised task, masked prediction. Unsupervised training is a lot easier to scale even across different types of experiments.

Expand full comment

I should've been more explicit. Didn't NDT2 already do all the masking and POYO the decoding across multiple tasks (both at Georgia Tech)? It seems like this is ND2 task + POYO architecture in many ways. Would like to hear your thoughts.

Expand full comment
author

Actually, the Zhang paper does *not* use a NDT-2 backbone, but rather NDT-1-stitch, which uses a simple, per session linear projection onto latent space and one-token-per-time-bin. The patching scheme from NDT-2 does not lend itself to easy alignment across sessions, and Zhang et al. have some ablations to show that it doesn't work well in their setup.

The Zhang paper doesn't use a lot of the architectural elements from POYO; one reason is that POYO is not super easy to adapt to self-supervised, or at least adapt in a way which is efficient, without resorting to binning, which is kind of the point of POYO. The Zhang paper does 2 things above and beyond prior art: introduce a multi-task pretraining scheme, and train on a larger dataset than prior art.

"Both at Georgia Tech" is a bit confusing. There's two independent Georgia Tech PIs at play here: Eva Dyer's group, responsible for the POYO paper and the Zhang paper (plus authors from Columbia and Mila); and Chethan Pandarinath's group, responsible for the NDT work.

Expand full comment

Thank you Patrick for the details. It helped clear up several misunderstandings I had. Please keep up these excellent posts.

Expand full comment

I can provide a few additional thoughts on this. The Universal Translator paper develops a novel multi-masking scheme for neural populations that allows for predicting population-level, neuron-level, and brain region-level neural activity. As demonstrated in the paper, this multi-masking objective (MtM) outperforms the simpler masking schemes used in NDT1 and NDT2 on the proposed metrics. It also leads to improved performance scaling with more training animals. Compared to POYO and the NDT models, the Universal Translator paper is the only work that trains models on a high-diversity of brain regions and tries to make inferences about how these regions might interact.

An additional note, the goal of this paper is to propose a path towards building a "foundation model" that can answer questions about brain-wide activity at any scale (neurons, regions, populations). I wouldn't call the current iteration a foundation model. :-)

Expand full comment

I appreciate the comments on your work Cole, especially pointing out the novelty of training on multi-region data not just common region multi-animal.

Expand full comment