Skip to content

Data Mining to find the Frequent Itemsets using SON algorithm

Notifications You must be signed in to change notification settings

drewm8080/data_mining_frequent_itemsets

Repository files navigation

Assignment Explanation

Datasets

  • One simulated dataset and one real-world dataset will be used for this assignment.

Task 1

  • Build and test the program with a small simulated CSV file provided.
  • Calculate combinations of frequent businesses and users based on a support threshold.
  • Create baskets for each user containing the business ids reviewed by the user, and for each business containing the user ids that commented on the business.

Task 2

  • Generate a subset using the Ta Feng dataset with a structure similar to the simulated data.

Algorithm

  • Implement the SON Algorithm on top of the Spark Framework.
  • Find all possible combinations of frequent itemsets in any given input file within the required time.

Input Format

  1. Case number: Integer specifying the case (1 for Case 1, 2 for Case 2).
  2. Support: Integer defining the minimum count to qualify as a frequent itemset.
  3. Input file path: Path to the input file including path, file name, and extension.
  4. Output file path: Path to the output file including path, file name, and extension.

Final Results:

  • Grade: 100%

About

Data Mining to find the Frequent Itemsets using SON algorithm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages