Tower 1
Reasoner Tower
A vision-language model (VLM) with autoregressive architecture. Interprets multimodal observations — images, videos, text — and understands motion, object interactions, and physical context. The "brain" that reasons about the world before any generation happens.