Skip to content

Question on train/val/test splitting #9

@VSainteuf

Description

@VSainteuf

Hello,
Thanks again for making this dataset publicly available.
I have a question regarding the train/val/test splitting.

I downloaded the coco files for each subset and each scenario, and as a sanity check I wanted to verify that none of the patches of the train set are present in the val and test set. I used the "patch_full_name" field as unique identifier for each patch, and I actually found the following intersections:

  • Scenario 1 : 514 patches are both in val and test
  • Scenario 2: 527 patches are both in train and val
  • Scenario 3: 489 patches are both in train and val

So I'm unsure if I'm missing something on how the dataset is split, or if there might have been an issue in the splitting strategy.
My understanding from the paper is that the splitting is done per patch (i.e., one patch is exclusively in train, val, or test).
Is it possible that the splitting was actually done at sub-patch level (of shape 188x188) ?
Thanks in advance for your help clarifying this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions