AugDiskCachedDataset to map the copy index to augmentation parameter #274

MinaKh · 2023-11-24T15:02:14Z

This branch added a child class for DiskCachedDataset called AugDiskCachedDataset.
Its main use is for a family of so-called deterministic augmentations with a rather discrete parameter space. For instance a noise augmentation on audio samples in which SNR can have only 5 values.

In DiskCachedDataset num_copies can be used to generate N copies of a data sample. This is ok when used transforms/augmentations have an infinite/probabilistic parameter space. So the chance of generating repetitive augmented versions is very low.
On the other hand for deterministic augmentations with N parameter it is advantageous to map the copy index to the parameter to avoid re-generating existing samples and to make sure generated copies cover all desired parameter space .
The main feature of this class is that the index of file copy is mapped to the parameter of augmentation

biphasic · 2023-11-25T12:59:20Z

Hello @MinaKh! Currently I don't understand how what you're trying to achieve with this class cannot already be done with existing classes.
It seems to me that you want to control the augmentations exactly, but then I don't understand why they're called augmentations. Can you please

provide an example of how you use your proposed class
explain with a concrete example why the current code cannot do what you need to do

Before I can merge this, this class would need a test as well, it might be helpful to add that as well.

codecov-commenter · 2023-12-20T18:20:57Z

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (db13037) 76.80% compared to head (c9d26b0) 77.34%.
Report is 12 commits behind head on develop.

Files	Patch %	Lines
tonic/cached_dataset.py	58.69%	19 Missing ⚠️
tonic/audio_transforms.py	92.00%	2 Missing ⚠️
tonic/audio_augmentations.py	98.78%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #274      +/-   ##
===========================================
+ Coverage    76.80%   77.34%   +0.53%     
===========================================
  Files           53       54       +1     
  Lines         3001     3165     +164     
===========================================
+ Hits          2305     2448     +143     
- Misses         696      717      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MinaKh · 2023-12-20T18:31:47Z

Hello @MinaKh! Currently I don't understand how what you're trying to achieve with this class cannot already be done with existing classes. It seems to me that you want to control the augmentations exactly, but then I don't understand why they're called augmentations. Can you please

provide an example of how you use your proposed class

explain with a concrete example why the current code cannot do what you need to do

Before I can merge this, this class would need a test as well, it might be helpful to add that as well.

Hi @biphasic! Thanks for your feedback.

Deterministic augmentations are not uncommon, specially in audio processing and they are still called augmentation but in a more controlled way.
I have added a notebook to docs/tutorails/Aug_DiskCachDataset.ipynb and have addressed your raised point there with a synthetic dataset. Please let me know if it is not clear.
I also have added a test test/test_aug_caching.py which is pretty similar to what I presented in the notebook. Please let me know if you have other ideas for tests.
this branch has been merged with the branch of my other PR : adding audio transforms.....

biphasic

Hello Mina, I have two small changes that I request, then I can merge this

biphasic · 2024-05-15T14:35:38Z

tonic/cached_dataset.py

 from warnings import warn

 import h5py
 import numpy as np
+from torchvision.transforms import Compose


Tonic should work without having torch installed, can you move this line to wherever it is used? So just wherever torchvision.transforms.Compose is used, import it one line above.
During testing and documentation, of course we can say that torch must be installed. That's why torch requirements are only used in the testing and documentation Github Actions steps

test/requirements.txt

…/tonic into add_Aug_DiskCachedDataset merging last minor modifications of the branch with latest tonim master

test/test_aug_caching.py

…from that

AugDiskCachedDataset added

61c7f85

MinaKh marked this pull request as draft November 24, 2023 15:02

MinaKh added 9 commits December 4, 2023 11:50

dict keys updated to more generic

9d00ca8

small fixes and cleanup

fb64e65

adding typing-extensions to requirments

ccafa5a

importing TypedDict from typing_extensions for python 3.7

d2c995e

Merge branch 'add_audio_transforms' into add_Aug_DiskCachedDataset

2af99d6

bug fixed in Aug_DiskCach

170f31b

including Aug_DiskCachedDataset in init imports

5c0f9a7

test added for aug_diskcached

b3c5d1b

notebook added to elaborate teh function of AugDiskCachedDataset

c9d26b0

biphasic marked this pull request as ready for review May 15, 2024 14:13

biphasic added 2 commits May 15, 2024 16:14

Merge branch 'develop' into add_Aug_DiskCachedDataset

97622dd

Merge branch 'develop' into add_Aug_DiskCachedDataset

fdb6dc9

biphasic requested changes May 15, 2024

View reviewed changes

MinaKh added 3 commits May 23, 2024 12:25

moving torch import to inside function, where it is used

2036b8f

moving typing-extensions to the root requirement

0ef6582

Merge branch 'add_Aug_DiskCachedDataset' of https://github.com/MinaKh…

43fcbe1

…/tonic into add_Aug_DiskCachedDataset merging last minor modifications of the branch with latest tonim master

biphasic reviewed May 28, 2024

View reviewed changes

test/test_aug_caching.py Outdated Show resolved Hide resolved

test/test_aug_caching.py Outdated Show resolved Hide resolved

MinaKh added 2 commits May 30, 2024 11:42

moving torchvision import to inside function

dd2baec

DataSet class import was removed, the mini dataset does not inherent …

c799c41

…from that

biphasic merged commit d67aa25 into neuromorphs:develop May 30, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AugDiskCachedDataset to map the copy index to augmentation parameter #274

AugDiskCachedDataset to map the copy index to augmentation parameter #274

MinaKh commented Nov 24, 2023 •

edited

Loading

biphasic commented Nov 25, 2023

codecov-commenter commented Dec 20, 2023

MinaKh commented Dec 20, 2023 •

edited

Loading

biphasic left a comment

biphasic May 15, 2024

AugDiskCachedDataset to map the copy index to augmentation parameter #274

AugDiskCachedDataset to map the copy index to augmentation parameter #274

Conversation

MinaKh commented Nov 24, 2023 • edited Loading

biphasic commented Nov 25, 2023

codecov-commenter commented Dec 20, 2023

Codecov Report

MinaKh commented Dec 20, 2023 • edited Loading

biphasic left a comment

Choose a reason for hiding this comment

biphasic May 15, 2024

Choose a reason for hiding this comment

MinaKh commented Nov 24, 2023 •

edited

Loading

MinaKh commented Dec 20, 2023 •

edited

Loading