This dataset contains SVG code examples for training and evaluating SVG models for image vectorization.
A large dataset of 20 million real-world eye images taken with head mounted devices. It includes segmentations of pupil, eyelid, and iris as well as …
The MNIST Database of Handwritten Digits, 1998
The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of …
The Pile is an 825GB English text corpus targeted at training large language models (LLMs). It is constructed from 22 diverse and high quality subsets …
40,000 lines of Shakespeare from a variety of Shakespeare's plays.
The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene.
The noise audio …
The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene. We also created WHAMR!, …
WILDS is a curated collection of benchmark datasets that represent distribution shifts faced in the wild. In each dataset, each data point is drawn from …
WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations and 40,943 entities. However, many …