Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians

University of Glasgow
Input Image
Input Image
Pixel-to-4D
Model Diagram
4D Representation
camera trajectory
Output Video
Illustration of the Pixel-to-4D pipeline: A dynamic 4D representation is generated for the input image using our novel architecture that predicts dynamic Gaussian splats. Then the output video is rendered along the given camera trajectory to produce a visually coherent video.
KITTI
KITTI
Waymo
Waymo
RE10K
RE10K
DL3DV-10K
DL3DV-10K
Ground-truth MotionCtrl CameraCtrl CamI2V RealCam-I2V Ours
Comparison of our method with baseline methods across four datasets. Our method preserves temporal and 3D consistency while faithfully following the input cameras and details of the input image. Its underlying 4D representation ensures accurate camera motion.