Datasets

  • SVG-Stack, 2023

    This dataset contains SVG code examples for training and evaluating SVG models for image vectorization.

  • TEyeD, 2020

    A large dataset of 20 million real-world eye images taken with head mounted devices. It includes segmentations of pupil, eyelid, and iris as well as …

  • The Pile, 2020

    The Pile is an 825GB English text corpus targeted at training large language models (LLMs). It is constructed from 22 diverse and high quality subsets …

  • WHAM!, 2019

    The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene.

    The noise audio …

  • WHAMR!, 2020

    The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene. We also created WHAMR!, …

  • WILDS, 2021

    WILDS is a curated collection of benchmark datasets that represent distribution shifts faced in the wild. In each dataset, each data point is drawn from …

  • WN18RR, 2018

    WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations and 40,943 entities. However, many …