What I'm Trying to Figure Out

I work on two things that look unrelated at first glance: analyzing neural data with graph neural networks, and exploring how AI models represent concepts internally.

They’re not separate interests. They depend on each other.

Using AI to understand the brain #

Brains are networks. Neurons fire, synchronize, form patterns — and somewhere in that activity, seizures happen. We can build a graph from EEG data, train a GNN to classify seizures, and get high accuracy. But accuracy alone doesn’t help clinicians. The model is a black box, and clinical adoption requires knowing why a prediction was made.

This is where explainable AI comes in — but not all explanations are equal. I’m particularly interested in counterfactual explanations: “what would need to change for the prediction to flip?” This is fundamentally different from saliency maps or feature importance. It’s actionable — a clinician can reason about what a counterfactual means in a way that a heatmap doesn’t afford.

In practice, though, it’s harder than it sounds. You can generate counterfactual graphs, but interpreting what those changes mean neuroscientifically — that’s where the real difficulty begins.

Understanding AI to use it better #

So you’re using a complex system (a GNN) to analyze another complex system (a brain). But if you don’t understand what the GNN is doing internally, how much can you trust what it tells you about the brain?

This is why I also work on representation analysis. Language models and neural networks map inputs into high-dimensional spaces, and the structure of those spaces reflects something about what the model has learned. With Trendscape, we build neighbor graphs in embedding spaces and trace paths between concepts — exploring not the concepts themselves, but the space that holds them. How are things arranged? What connects them?

The answer isn’t a number. It’s a visualization that a human looks at and finds meaning in — or doesn’t.

Back and forth #

The two directions feed each other. To learn something about the brain, I need AI tools I can trust. To trust those tools, I need to understand what’s happening inside them. Neither side is the foundation — I’m moving between them, using each to make progress on the other.

What I still don’t know #

These questions don’t stay inside computer science. What is a concept? When we say a model “represents” something, what does that mean — understanding, or statistical regularity? And counterfactual reasoning itself is philosophically contested. There are serious objections to counterfactual accounts of causation, and I can’t hand-wave those away because my model produces outputs that look useful.

I’m an engineer working at the edge of philosophy, neuroscience, and cognitive science. I have a lot more to learn — about causation, about what concepts really are, about whether “does AI understand?” is even a well-formed question. But I think these are questions worth sitting with, and I’d rather keep building while staying honest about what I don’t know.