Typically pre wandb my approach to organizing dataset was to have lots of subfolders -
mnist complete augmented-mild augmented-heavy sampled-examples mnist-1000 augmented-mild augmented-heavy mnist-10k augmented-mild augmented-heavy sampled-class-examples mnist-1000-5cls mnist-10k-5cls
On going through wandb artifacts docs, it seems it is best to have a flattened structure for dataset versioning. How much flattening is ideal? A complete flattening would mean each of those above to have a different name and same type(say “balanced-dataset”).Completely flattening dataset hierarchy seems to take away the “versioning” ability of wandb as now all of them are different artifacts.