-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on PrivBayes.py #16
Comments
Hi, these |
sorry, i was not clear in my question. for each task, there will be different combination of child, [parents] pair in the worker process There will be overlap of child, [parents] pair between the last pair in task 1 and the only pair in task 2. Wouldn't it be more efficient if we create a whole list of combinations of child, [parents] pairs first, before splitting them out to the various tasks for parallel processing? |
In task 1, [race, [nationality, income]] won't be generated, since one parent must be 'age' due to In terms of generating tasks efficiently, the number of (child, parents) pairs is exponential to K (the number of parents), so pre-computing all pairs may cost too much time or memory. Let m = #rest_attributes, n = |V|, k = #parents. There are about O(m n^k) child-parents pairs in total. The current implementation generates m(n-k+1) tasks. The drawback of current implementation is that the tasks have significantly different workloads as shown in your example. It is better to have more balanced tasks. Please feel free to make some suggestions on it. |
@haoyueping Is it possible that Data Synthesizer library automatically choose the value of k , and the epsilon and all the necessary hyper parameters on its own , so that we have not to tune the parameters ! |
Thanks for the good work!!
Am looking at the bayesian creation code and have questions regarding line 153-155 and line 111in the PrivBayes.py code:
num_parents = min(len(V), k)
tasks = [(child, V, num_parents, split, dataset) for child, split in
product(rest_attributes, range(len(V) - num_parents + 1))]
What is the rationale behind generating a list of combinations with different split points for each attribute in the rest_attributes list?
It seems like the worker function code can account for all the combinations of attribute and parents pairs in line 111 , just by looking at the entire V for each attribute, instead of iterating all possible V[split:] for each attribute.
The text was updated successfully, but these errors were encountered: