Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians
University of Glasgow
Input Image
→
Pixel-to-4D
→
Illustration of the Pixel-to-4D pipeline: A dynamic 4D representation is generated for the input image using our novel architecture that predicts dynamic Gaussian splats. Then the output video is rendered along the given camera trajectory to produce a visually coherent video.
KITTI
KITTI
Waymo
Waymo
RE10K
RE10K
DL3DV-10K
DL3DV-10K
Ground-truth
MotionCtrl
CameraCtrl
CamI2V
RealCam-I2V
Ours
Comparison of our method with baseline methods across four datasets. Our method preserves temporal and 3D consistency while faithfully following the input cameras and details of the input image. Its underlying 4D representation ensures accurate camera motion.