Discussion about this post

User's avatar
The AI Architect's avatar

The part about DeepSeek V3.2 competing with frontier models at a fraction of the cost is really interesting. Their distillation approach and sparse attention mechanism seems to be the key here, not just throwing more compute at it. Makes you wonder if that specialist RL training technique could become a standard pattern for efficient open models.

Expand full comment

No posts

Ready for more?