MNIST no more – show us your favorite datasets!

What are you guys’ favorite datasets? :slight_smile:

I really like this TACO (Trash Annotations in Context) dataset that I found recently. It contains pictures of trash in the real world and makes for a good object detection problem!

6 Likes

The first conversation @lavanyashukla and I ever had, when we’d just started at W&B, was about where to find good datasets!

If you’re having a hard time saying goodbye to MNIST (like me!) consider MNIST-1D. It’s a semi-artificial version of MNIST that is designed to better support “science of deep learning” style projects.

2 Likes

You can also build augment MNIST versions, like the one called “Moving MNIST” where you put mnist digits moving around a canvas, it is good for benchmarking next frame prediction.

I also like the fastai ones “Imagenette” and “ImageWang”.

3 Likes

I love calmcode.io’s Datasets which are described as:

When you’re learning a new data tool it often helps to have a dataset nearby that serves as an example. Many of these datasets are also fun to explore on their own which is why we list them here so you can download them.

1 Like

This sounds lame but I really want to play with CUAD - a legal contracts dataset annotated by lawyers. Its rare to see a dataset that would have been this expensive to annotate out there in the wild I think.

https://www.atticusprojectai.org/cuad

(this interest may or may not stem from my long-term interest in automating away lawyers after dealing with a slow, forgetful inept solicitor for the guts of 6 months :grimacing: )

7 Likes

CoRE50

1 Like

I like making my own :grin: