Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcatDataset Behavior and Equal Dataset Length Requirement: #95

Open
lianabagh opened this issue Aug 15, 2023 · 0 comments
Open

ConcatDataset Behavior and Equal Dataset Length Requirement: #95

lianabagh opened this issue Aug 15, 2023 · 0 comments

Comments

@lianabagh
Copy link

The current implementation of the ConcatDataset class in the provided codebase enforces a requirement that all datasets within the ConcatDataset must have the same length for the getitem function to function correctly. This restriction is reflected in the calculation of indices for each dataset during item retrieval, which can lead to errors if the datasets have varying lengths. The requirement for equal lengths might limit the flexibility of the ConcatDataset class when dealing with datasets of different lengths.

class ConcatDataset(AudioDataset):
def init(self, datasets: list):
self.datasets = datasets

def __len__(self):
    return sum([len(d) for d in self.datasets])

def __getitem__(self, idx):
    dataset = self.datasets[idx % len(self.datasets)]
    return dataset[idx // len(self.datasets)]

Default Length in AudioDataset:
Additionally, within the Audiodataset class, there is a variable named n_examples that sets the default length of the dataset to 1000. It's important to note that this value might not be aligned with the actual length of the dataset instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant