What I really struggled to understand in the original Mamba paper was the math (hard to figure out the dimensions of the variables involved). It's interesting to see the history of SSMs, though I'm still on the lookout for a simple explanation of the layer-by-layer math involved. It'd be great to see an article on that!
What I really struggled to understand in the original Mamba paper was the math (hard to figure out the dimensions of the variables involved). It's interesting to see the history of SSMs, though I'm still on the lookout for a simple explanation of the layer-by-layer math involved. It'd be great to see an article on that!