Meta’s V-JEPA 2, introduced on June 11, 2025, is a significant advancement in AI, designed to enable machines to understand and predict the physical world in a manner akin to human reasoning. This model builds upon its predecessor, V-JEPA, by incorporating over a million hours of real-world video data, allowing it to learn patterns of object interaction, movement, and spatial relationships without relying on labeled data.
V-JEPA 2 is a 1.2-billion-parameter model that employs a self-supervised learning approach, enabling it to infer missing parts of video sequences and develop an internal representation of the world. This capability enables AI agents to anticipate the outcomes of their actions, such as predicting that a thrown ball will return to the ground due to the force of gravity. Meta’s Chief AI Scientist, Yann LeCun, envisions this as a step towards achieving Advanced Machine Intelligence (AMI), where AI systems can plan and adapt in complex, dynamic environments.
In practical applications, V-JEPA 2 has demonstrated the ability to perform tasks like picking up and placing objects in new and unseen environments, achieving success rates between 65% and 80%. This performance is notable given the model’s generalisation across various scenarios without the need for extensive retraining. Additionally, V-JEPA 2 operates 30 times faster than Nvidia’s Cosmos model, which also focuses on enhancing physical-world intelligence, highlighting its efficiency.
Meta has also introduced three new benchmarks, IntPhys 2, Minimal Video Pairs (MVPBench), and Causal VQA, to evaluate AI models’ understanding and reasoning about physical events from video. These benchmarks aim to provide a more comprehensive assessment of AI’s ability to comprehend and predict physical interactions, moving beyond traditional image classification metrics.
V-JEPA 2 is available as an open-source model on Hugging Face, accompanied by code and documentation on GitHub, encouraging further research and development in the field of AI-driven physical reasoning.
Meta’s V-JEPA 2 represents a significant leap towards creating AI systems that can perceive, understand, and interact with the physical world in a human-like manner, paving the way for more intelligent and adaptable machines in various real-world applications.