Neuroscientists should definitely get into interpretability research!

For a while we've been saying that neuroscientists should consider getting into artificial neural network (ANN) interpretability research; it's kind of important right now (it lets us understand why problems occur and how to work towards fixing them, and lets us get another angle on the model's underlying beliefs or intentions so that we can train them to stop knowingly lying), and we expect that most neuroscientists would enjoy doing interpretability research, since ANNs have this fun quality where you can have a complete record of the state of every neuron (think of the studies this makes possible. It's impossible to track the state of the synapses of even one biological neuron; if you've been doing research successfully under limitations like like that, you must be incredibly strong at this point, you could make so much progress here.)

But I think the case is even stronger than that. To develop a deep understanding the brain, we probably have to at least be able to solve ANN interpretability:

Both ANNs and the brain consist of a relatively simple recurring many-to-many component that is somehow adding up to self-organizing learning substrates for reasons that no one really deeply understands yet. We know that no one understands it because no one is good at predicting what ANNs will be able to learn as we scale them. No one knew ANNs would work particularly well until after we could just scale them up and find out (my impression is that the apparent abundance of AI researchers today who are saying that they anticipated the success of deep learning is mainly a result of survivorship bias. They guessed right, and the many others who guessed wrong are now brushed aside and ignored, and I have not come across an attempt at a formal explanation). Biological neurons are probably using at least some variation of the same unexplained emergent self-organization process as part of whatever they're doing. ANNs are much simpler than biological neurons (basically just attention ; weighted sum ; ReLU ((x)→ max(0, x))), so if we can't understand how an ANN learns, we should be pretty confident that there's also some core principle about the brain that we're still missing.

So start with ANNs! It's going to be much easier to start there, and they're also pretty important right now!

If I've gotten you curious, ANN interpretability researcher Chris Olah did a really energizing interview about this stuff that you should probably start with.

@makoconstruct.bsky.social

2024-11-25T00:51:52.032Z

mako

Post reaction in Bluesky

Reactions from everyone (0)

mako