We're actively engaged in adding new data to our dataset, and we find that FloydHub is an excellent way to centrally manage that dataset across our team in a way that provides concrete answers to the question of which data got which results.
In our current data-labeling process, we have new data to add to our dataset a few times a week, but these new additions are a small fraction of the size of the entire dataset. We're also looking to involve crowdsourcing platforms to generate additional label information, which would increase the frequency with which the dataset changes.
As our dataset grows, it becomes increasingly time-expensive to upload the entire dataset, most of which is unchanged from the previous version.
Alternatively, we could wait to upload a new version of the dataset until it contains several new additions, but then our developers would not benefit from those new dataset additions until the upload occurs.
Certainly dataset version control here remains valuable for reproducing results and guiding development, but it would save us time to be able to create a new version of the dataset by only uploading what's new, and incorporating the contents of the previous dataset as well.
Thanks for your continued work on this platform! It's already been hugely helpful for us.