Assignment Explanation

Datasets

One simulated dataset and one real-world dataset will be used for this assignment.

Task 1

Build and test the program with a small simulated CSV file provided.
Calculate combinations of frequent businesses and users based on a support threshold.
Create baskets for each user containing the business ids reviewed by the user, and for each business containing the user ids that commented on the business.

Task 2

Generate a subset using the Ta Feng dataset with a structure similar to the simulated data.

Algorithm

Implement the SON Algorithm on top of the Spark Framework.
Find all possible combinations of frequent itemsets in any given input file within the required time.

Input Format

Case number: Integer specifying the case (1 for Case 1, 2 for Case 2).
Support: Integer defining the minimum count to qualify as a frequent itemset.
Input file path: Path to the input file including path, file name, and extension.
Output file path: Path to the output file including path, file name, and extension.

Final Results:

Grade: 100%

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assignment2 - Fall 2023.pdf		Assignment2 - Fall 2023.pdf
README.md		README.md
Ta_Feng_dataset.csv		Ta_Feng_dataset.csv
example_output_task_1.csv		example_output_task_1.csv
example_output_task_2.csv		example_output_task_2.csv
small1.csv		small1.csv
small2.csv		small2.csv
task1.py		task1.py
task2.py		task2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment Explanation

Datasets

Task 1

Task 2

Algorithm

Input Format

Final Results:

About

Releases

Packages

Languages

drewm8080/data_mining_frequent_itemsets

Folders and files

Latest commit

History

Repository files navigation

Assignment Explanation

Datasets

Task 1

Task 2

Algorithm

Input Format

Final Results:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages