Essays

Research and essays on agent evaluation, RL infrastructure and simulations.

Understanding Virality with Vision-Language Models

24 December 2025

We present a rubric-based Vision-Language Model framework for evaluating short-form edutainment content. Our system extracts unsupervised audiovisual features and clusters them into interpretable factors that predict viewer engagement better than conventional metrics.

Read More

Metaphi Simhub

8 October 2025

Large language models have demonstrated superhuman capabilities in discrete, well-defined coding tasks, but their progression into truly agentic, collaborative software engineering partners is hampered by a fundamental limitation in training and evaluation. We introduce Metaphi Simhub, a platform designed to solve this challenge through interactive simulation environments.

Read More

From Static Benchmarks to Dynamic Worlds

20 September 2025

The current paradigm of LLM evaluation suffers from a complete lack of interactivity. There is no "user" in the evaluation loop. We explore how principles from autonomous vehicle simulation can transform code generation agent training.

Read More