Meta Launches V-JEPA 2: Advanced AI Model for Understanding the Physical World

Meta’s V-JEPA 2, introduced on June 11, 2025, is a significant advancement in AI, designed to enable machines to understand and predict the physical world in a manner akin to human reasoning. This model builds upon its predecessor, V-JEPA, by incorporating over a million hours of real-world video data, allowing it to learn patterns of object interaction, movement, and spatial relationships without relying on labeled data.

V-JEPA 2 is a 1.2-billion-parameter model that employs a self-supervised learning approach, enabling it to infer missing parts of video sequences and develop an internal representation of the world. This capability enables AI agents to anticipate the outcomes of their actions, such as predicting that a thrown ball will return to the ground due to the force of gravity. Meta’s Chief AI Scientist, Yann LeCun, envisions this as a step towards achieving Advanced Machine Intelligence (AMI), where AI systems can plan and adapt in complex, dynamic environments.

In practical applications, V-JEPA 2 has demonstrated the ability to perform tasks like picking up and placing objects in new and unseen environments, achieving success rates between 65% and 80%. This performance is notable given the model’s generalisation across various scenarios without the need for extensive retraining. Additionally, V-JEPA 2 operates 30 times faster than Nvidia’s Cosmos model, which also focuses on enhancing physical-world intelligence, highlighting its efficiency.

New model from Meta: V-JEPA 2

V-JEPA 2 is a cutting-edge world model trained on video that helps AI agents, including robots, understand and predict the physical world. Like humans use physical intuition to anticipate outcomes—such as catching a falling ball or navigating… pic.twitter.com/fwpyUPNRH3
— Kuan Hoong (@kuanhoong) June 12, 2025

Meta has also introduced three new benchmarks, IntPhys 2, Minimal Video Pairs (MVPBench), and Causal VQA, to evaluate AI models’ understanding and reasoning about physical events from video. These benchmarks aim to provide a more comprehensive assessment of AI’s ability to comprehend and predict physical interactions, moving beyond traditional image classification metrics.

V-JEPA 2 is available as an open-source model on Hugging Face, accompanied by code and documentation on GitHub, encouraging further research and development in the field of AI-driven physical reasoning.

Meta’s V-JEPA 2 represents a significant leap towards creating AI systems that can perceive, understand, and interact with the physical world in a human-like manner, paving the way for more intelligent and adaptable machines in various real-world applications.

Meta Unveils V-JEPA 2: An AI Model for Understanding the Physical World

Havilah Mbah

Leave a ReplyCancel Reply

Havilah Mbah

Related Stories

Google Launches Nano Banana

Musk Sues OpenAI & Apple

Meta Partners With Midjourney

Leave a ReplyCancel Reply