Essays
Research and essays on agent evaluation, RL infrastructure and simulations.
Understanding Virality with Vision-Language Models
24 December 2025
We present a rubric-based Vision-Language Model framework for evaluating short-form edutainment content. Our system extracts unsupervised audiovisual features and clusters them into interpretable factors that predict viewer engagement better than conventional metrics.
Read MoreMetaphi Simhub
8 October 2025
Large language models have demonstrated superhuman capabilities in discrete, well-defined coding tasks, but their progression into truly agentic, collaborative software engineering partners is hampered by a fundamental limitation in training and evaluation. We introduce Metaphi Simhub, a platform designed to solve this challenge through interactive simulation environments.
Read MoreFrom Static Benchmarks to Dynamic Worlds
20 September 2025
The current paradigm of LLM evaluation suffers from a complete lack of interactivity. There is no "user" in the evaluation loop. We explore how principles from autonomous vehicle simulation can transform code generation agent training.
Read More