SKETCH2COLAB
Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Divyanshu Daiya, Aniket Bera
IDEAS Lab, Department of Computer Science, Purdue University

Accepted to CVPR 2026!!

Reach out to divyanshu@purdue.edu for code and weights download links.

Paper HF Demo Soon Code Email Supplementary BibTeX

Sketch-conditioned human–object–human demonstrations with SKETCH2COLAB. (a) Two people co-manipulate a table following a sketched path. (b) Cooperative transport of a large box. (c) A specified hand grasp following a complex path. The system is driven by sparse storyboard keyframes.

Abstract

We present SKETCH2COLAB, a framework that converts storyboard-style 2D sketches into coherent, object-aware 3D multi-human motion with fine-grained control over agents, joints, timing, and contacts. Diffusion-based motion generators can produce realistic motion, but multi-entity storyboard control often requires expensive guidance and can degrade under strong conditioning.

SKETCH2COLAB instead learns a sketch-conditioned diffusion prior and distills it into a rectified-flow student in latent space for fast and stable sampling. To closely follow the storyboard, the student is guided with differentiable objectives for keyframes, paths, contacts, and physical consistency.

Since collaborative motion also involves discrete interaction changes, such as converging, forming contact, cooperative transport, and disengaging, we introduce a lightweight continuous-time Markov chain planner. This planner tracks the active interaction regime and modulates the flow to produce clearer and more synchronized human-object-human coordination. Experiments on CORE4D and InterHuman show that SKETCH2COLAB improves constraint adherence and perceptual quality while sampling substantially faster than diffusion-only alternatives.

Method Overview

Basic overview of SKETCH2COLAB. Our model first plans the hidden choreography from storyboard sketches using CTMC Interactive Phase scheduler, then generates continuous motion in latent space with a rectified flow model conditioned on the choreography and sketch constraints. A rough 3D preview is used to measure pose, path, and contact errors, which guide the latent motion before final 3D decoding.

Detailed pipeline of SKETCH2COLAB. (a) Storyboard sketches and optional text are encoded as controls for latent motion generation. A diffusion teacher guides a rectified-flow student, and a frozen decoder maps the final latent to 3D motion. (b) At each update step, the student predicts a motion change, the phase planner selects the current interaction stage, and guidance keeps the motion close to the sketched poses, paths, and contacts. (c) The student is a temporal U-Net with entity attention for coordinated multi-entity motion.

Paper Presentation

Sketch-Driven Human-Object-Human Generation on CORE4D Comparisons

HQ video might take time to load.

⬇️ Download Video

Multi Human only Sketch Control on Interhuman Comparisons