S k e t c h 2 C o l a b
Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Anonymous Authors
Anonymous Institution

Under Review

Abstract

We present Sketch2Colab , which turns storyboard-style 2D sketches into coherent, object-aware 3D multi-human motion with fine-grained control over agents, joints, timing, and contacts. Conventional diffusion-based motion generators have advanced realism; however, achieving precise adherence to rich interaction constraints typically demands extensive training and/or costly posterior guidance, and performance can degrade under strong multi-entity conditioning.

Sketch2Colab instead first learns a sketch-driven diffusion prior and then distills it into an efficient rectified-flow student operating in latent space for fast, stable sampling. Differentiable energies over keyframes, trajectories, and physics-based constraints directly shape the student’s transport field, steering samples toward motions that faithfully satisfy the storyboard while remaining physically plausible. To capture coordinated interaction, we augment the continuous flow with a continuous-time Markov chain (CTMC) planner that schedules discrete events such as touches, grasps, and handoffs, modulating the dynamics to produce crisp, well-phased human–object–human collaborations. Experiments on CORE4D and InterHuman show that Sketch2Colab achieves state-of-the-art constraint adherence and perceptual quality while offering significantly faster inference than diffusion-only baselines.

Sketch-conditioned human–object–human (HOH) demonstrations with Sketch2Colab. (a) Two people co‑manipulate a table following a sketched path. (b) Cooperative transport of a large box. (c) A specified hand grasp following a complex path. The system is driven by sparse storyboard keyframes.

Method Overview

Architecture of Sketch2Colab: The model distills a rectified-flow student from a sketch-driven diffusion teacher. A CTMC over phases modulates sub-fields and contact weights, while raw-space energies and latent anchors define dual-space guidance.

Sketch-Driven Human-Object-Human Generation on CORE4D Comparisons

HQ Video might take time to load

⬇️ Download Video

Multi Human only Sketch Control on Interhuman Comparisons

⬇️ Download Video

Acknowledgements

[Acknowledgements hidden for anonymous review]

The website template was adapted from GRAM.

S k e t c h 2 C o l a b
Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Huugging Face Demo (Coming Soon)

Code (Coming Soon)

Abstract

Method Overview

Sketch-Driven Human-Object-Human Generation on CORE4D Comparisons

Multi Human only Sketch Control on Interhuman Comparisons

Acknowledgements

S k e t c h 2 C o l a b Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Huugging Face Demo (Coming Soon)

Code (Coming Soon)

Abstract

Method Overview

Sketch-Driven Human-Object-Human Generation on CORE4D Comparisons

Multi Human only Sketch Control on Interhuman Comparisons

Acknowledgements

S k e t c h 2 C o l a b
Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation