1. The Dataset: 2,028 High-Density Captures
Our foundation is built on a vetted library of 2,028 professional 2D JPEG images. This collection isn’t just a gallery; it is a structured dataset designed for spatial intelligence. We focus on the African Big 5 and associated megafauna to test AI’s ability to understand:
Scale Persistence: (588 Elephant frames)
Volumetric Texture: (428 Lion frames)
High-Contrast Edge Detection: (279 Zebra frames)
Pattern Stability: (281 combined Leopard and Cheetah frames)
2. Generative Engine: Gemini 3 Pro & Veo 3.1
We utilize the Google Gemini 3 Pro architecture and Veo 3.1 to generate cinematic wildlife sequences.
By leveraging latent action models, we create 24fps high-fidelity video that serves as a bridge between static 2D imagery and fully realized 3D environments.
Our site currently showcases 28 of these specialized AI cinematic works.
3. Local Compute Infrastructure
To process and benchmark these models, we operate on high-tier local hardware capable of handling massive VRAM overhead:
Workstation: Lenovo ThinkStation P7
Compute: Dual Nvidia RTX A5500 GPUs
Memory: 48GB VRAM via NVLink
Workflow: This setup allows for local pre-visualization of NeRFs (Neural Radiance Fields) and Gaussian Splatting to ensure the highest spatial accuracy before generative synthesis.
4. Future Frontier: Genie 3 Interactive Environments
The next phase of our project involves the integration of Google DeepMind’s Genie 3.
Our goal is to move beyond the frame—allowing visitors and researchers to step into our photography and navigate a 360-degree, physics-compliant African savanna.