Home

Apple Vision Pro Immersive Photos: AI Spatial Scenes Explained

By Next Reality

Related Products We may receive commission on purchases made using these links

Quest 3 512GB | VR Without Wires — Gorilla Tag Cardboard Monkenaut Bundle — Amazon Exclusive— Access to 100+ Games with a 3-Month Trial of Meta Horizon+ Included

4.5

Quest 3S 128GB | VR Without Wires — Gorilla Tag Cardboard Monkenaut Bundle — Amazon Exclusive —Access to 100+ Games with a 3-Month Trial of Meta Horizon+ Included

4.6

One Pro AR Glasses with X1 Chip, Native 3 DoF, X-Prism Optics, Real 3D, 57°FOV 171" 120Hz FHD Display, XR Glasses for iPhone 17/16, Steam Deck, ROG, Mac, PC, Android & iOS M (IPD 57-66mm)

Apple Vision Pro immersive photos: AI spatial scenes explained

Apple announced at WWDC 2025 that visionOS 26 will use generative AI to convert ordinary flat photos into spatial scenes, creating multi-perspective views that let Vision Pro users physically lean in and look around. The feature is designed to bring the headset's depth capabilities to photos that were never captured in stereo, though Apple has not specified how broadly it will work across a typical photo library.

Spatial scenes will surface natively in the Photos app, Spatial Gallery, and Safari, according to Road to VR. Developers can build the same capability into their own apps through a dedicated Spatial Scene API. visionOS 26 is expected to ship later this year, with developer testing already underway.

How visionOS 26 spatial photos differ from Apple Vision Pro spatial scenes

Vision Pro launched with support for spatial photos, but capturing them required specific hardware: the headset itself, or an iPhone 15 Pro, 15 Pro Max, or iPhone 16, Road to VR noted. Every other photo in a user's library showed up as a flat rectangle inside the headset, no different from looking at a monitor.

Spatial scenes work from a different starting point. Rather than drawing on two physical lenses that record the same moment from slightly offset positions, the system uses on-device generative AI and computational depth to construct a three-dimensional scene around an ordinary image, then renders it in real time from the viewer's current angle, Road to VR reported.

The technical component enabling this is RealityKit's new ImagePresentationComponent. It accepts both standard single-lens photos and existing spatial stereo input, producing the same multi-perspective scene output from either, Apple's WWDC session confirms.

That architectural detail matters. A captured spatial photo contains genuine left-eye and right-eye data recorded simultaneously by two physical lenses. A spatial scene infers depth from pixel data in a flat image, essentially guessing at structure that was never actually measured, as Road to VR put it. That is a meaningful engineering achievement. It is also, by definition, an approximation.

Apple describes the output as adding "lifelike depth" to photos, per the Apple AU Newsroom. How convincing it actually is depends on the kinds of images the feature encounters, and those specifics weren't covered at WWDC.

Where Apple Vision Pro immersive photos surface in visionOS 26

The integration across Photos, Spatial Gallery, and Safari means Vision Pro owners will encounter this in apps they already use, without hunting for a standalone feature. Apple is treating it as infrastructure, not an experiment.

Developers get access to the same underlying technology through the Spatial Scene API, which lets them build AI-generated depth into their own visionOS apps, Road to VR confirmed. The API accepts standard photography as input.

Zillow is the first confirmed commercial adopter. The company is building the API into its Zillow Immersive app for Vision Pro, giving users depth-enhanced views of home and apartment listings, per the Apple AU Newsroom. Real estate photography is a reasonable early test case: listing photos are almost always flat shots, and someone trying to read a room's proportions from a listing has a practical reason to want depth, not just a sentimental one.

Travel platforms, retail, and educational publishing all sit on large libraries of existing flat photography that no one is going to recapture in stereo. For those categories, AI-inferred depth from standard photos is lighter and cheaper than commissioning 3D captures, and the quality bar is arguably lower than it is for personal memories, where users know exactly what a scene looked like. Whether the feature clears that bar is precisely what the WWDC announcement didn't establish.

What Apple hasn't specified

The announcement described what the system is designed to produce. It did not say which photos qualify.

It remains unclear whether spatial scene generation applies to any image in a user's library or only to photos with enough visual structure for depth estimation to work from. Portrait mode images carry depth metadata that could give the AI a head start. A flat group shot in mixed indoor lighting, with no obvious foreground-background separation, is a harder problem. Apple hasn't said how the system handles that difference, or whether it handles it at all, based on available developer documentation.

Processing scope is also unaddressed. Apple's developer documentation confirms on-device AI, but does not clarify whether conversion happens automatically in the background or on demand, whether older Vision Pro hardware performs identically, or whether there are image categories where the feature simply doesn't fire.

These gaps matter for how significant the upgrade actually turns out to be. If spatial scene conversion works broadly across a typical photo library, the everyday proposition for Vision Pro owners shifts considerably: the headset's immersive content is no longer limited to a small set of deliberately captured spatial media. If the feature applies only to a subset of images, or produces visible artifacts in common shooting conditions, that's a different story. Apple has provided no performance data and no failure-case guidance. The WWDC session describes intent. The release will reveal execution.

What to watch when visionOS 26 ships

visionOS 26 has no confirmed ship date beyond "later this year," Road to VR reported at the time of the announcement last week.

The content gap between Vision Pro's spatial capabilities and the flat photos filling most users' libraries has been one of the device's persistent friction points since launch. Using AI to unlock existing photo libraries is a more practical response than asking users to reshoot their memories with different hardware. "Apple Vision Pro has defined what's possible in this new era of spatial computing, and with visionOS 26, we're excited to push the boundaries even further," said Mike Rockwell, Apple's vice president of the Vision Products Group, per the Apple AU Newsroom.

Three questions will be answerable once the software ships: which photos actually qualify for conversion, how well the depth inference holds up across the varied and imperfect images that fill a real photo library, and whether Zillow's real estate implementation performs as a genuine use case or mainly as a launch demo.

Apple's iOS 26 and iPadOS 26 updates are packed with new features, and you can try them before almost everyone else. First, check Gadget Hacks' list of supported iPhone and iPad models, then follow the step-by-step guide to install the iOS/iPadOS 26 beta — no paid developer account required.