Bug: partial distribution produced by `get_noisy_distribution_of_attributes` #26

zjroth · 2020-09-09T20:35:31Z

DataSynthesizer version: 0.1.2

Description

The function get_noisy_distribution_of_attributes only gets a partial distribution. This bug was introduced in commit 1abe702. Here is the relevant code as it appears in master (currently commit be8b65a):

full_space = None
for item in grouper_it(products, 1000000):
    if full_space is None:
        full_space = DataFrame(columns=attributes, data=list(item))
    else:
        data_frame_append = DataFrame(columns=attributes, data=list(item))
        full_space.append(data_frame_append)

In particular, full_space.append does not modify full_space; instead, it returns a new object. (This seems to be true for all versions of pandas.) As a result, full_space does not store all of the intended rows but, rather, only at most the first 1000000.

The text was updated successfully, but these errors were encountered:

haoyueping · 2020-09-14T00:55:23Z

Hi zjroth, thanks for your feedback. This bug is now fixed by commit 1ced27c

zjroth · 2020-09-22T14:06:59Z

Thanks for the quick response and fix. It's worth noting that the bug was introduced in an attempt to fix memory issues (commit 1abe702). It's clear why this initial "fix" would have reduced memory consumption: No more than two million rows would ever be loaded at the same time. However, if I'm reading the code correctly, that is no longer the case with this new commit (1ced27c). As such, the attempt to reduce memory consumption may need to be revisited. (To be clear, this is not currently an issue for me.)

I just wanted to mention this in case a new issue needs to be created to resolve potential memory issues.

haoyueping added a commit that referenced this issue Sep 14, 2020

Fix bug #26

1ced27c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: partial distribution produced by `get_noisy_distribution_of_attributes` #26

Bug: partial distribution produced by `get_noisy_distribution_of_attributes` #26

zjroth commented Sep 9, 2020

haoyueping commented Sep 14, 2020

zjroth commented Sep 22, 2020

Bug: partial distribution produced by get_noisy_distribution_of_attributes #26

Bug: partial distribution produced by get_noisy_distribution_of_attributes #26

Comments

zjroth commented Sep 9, 2020

Description

haoyueping commented Sep 14, 2020

zjroth commented Sep 22, 2020

Bug: partial distribution produced by `get_noisy_distribution_of_attributes` #26

Bug: partial distribution produced by `get_noisy_distribution_of_attributes` #26