Machine Learning + 3D


Tags: None
State: None

3D Scan Data & Papers

Environments

General

Objects

Methods/Papers

Datasets

Hands

  • FreiHAND
  • Data collection: Semi-automated human in the loop annotations
  • GHUM (also for people)

Humans/Avatars

Scanners (Hardware)

If a shared environment (i.e. for EgoExo), then bigger hardware setups might be practical if budget allows.

Datasets, Methods, Collections Methods

tl;dr

  • Current 3D datasets exist with and without (image/video, 3D scan) pairs
  • 3D reconstruction methods use multiple datasets for training (such as H-NeRF, PIFuHD)
  • to collect data:
    • Proprietary hardware is used (e.g. BUFF, SCAPE, GHUM), or
    • "Calibration"-like data is captured
    • Camera array from multiple views/perspectives (e.g. 22 cameras), or
    • Captured from a single view where the subject moves around (e.g. PeopleSnapshot dataset)
    • Optionally (e.g. for RenderPeople) a 3D model is made from artists (i.e. through annotations)
  • Synthetic datasets (AGORA, SURREAL)

Proprietary hardware includes: Caesar, 3dMD, Cyberware Wholebody scanner

I believe we should try to collect "calibration" data, if possible, from a multi-camera view. If anyone else has any other opinions or suggestions, let me know.

A lot of papers use SMPL-X (or SMPL), a parametric 3d human model, which seems to be from the same lab as AGORA (see below)

  • alternative 3D model is imGHUM (H-NERF uses it), which is a learnable 3D model
  • PIFuHD (1)
  • Reconstruction of 3D human model with one image
  • Uses RenderPeople data & makes a Synthetic dataset using this dataset
  • HierarchicalProbabilistic3DHuman
  • Agora, SMPL-X
  • End-to-End Human Pose and Mesh Reconstruction with Transformers
  • GHUM (google research)
    • Caesar data used
  • imGHUM
    • Generative model of 3D human shape and pose represented as a signed distance field
    • Introduces GHS3D
  • H-NeRF
    • Uses a camera array to reconstruct humans
    • Based on imGHUM
    • Temporal reconstruction of humans in motion
    • Works on monocular video or sparse set of cameras
  • PeopleSnapshot: The cheapest/easiest method for data collection
    • This is essentially “calibration” data, which may be valuable
    • Subjects rotate around a camera
  • BUFF uses 3dMD and a custom setup. Described on MPI Dynamic FAUST
  • Humans3.6M
  • 3D laser scans of 11 actors
  • Accurate 3D joint positions and joint angles
  • Setup is a camera array in a "lab" setting (see: http://vision.imar.ro/human3.6m/description.php)
  • 4 RGB cameras, 10 motion cameras, 1 time-of-flight sensor (TODO what is time-of-flight)
  • SCAPE - uses Cyberware Wholebody scanner
  • Also see: https://www.sciencedirect.com/topics/engineering/cyberware
  • RenderPeople data: used by some papers, such as 1
  • RenderPeople collects their data with a camera array similar to above (with more cameras) and annotates in a 3D program (Maya, Zbrush, etc.)

Synthetic datasets

Other resources: