LibCPV::Categorizer - Class for Hierarchical CPV-Number Categorizing via AI::Categorizer.
my $doc_set = new LibCPV::Categorizer::DocumentSet
({
dirname => '/path/to/docset/dir'
});
$doc_set->add_docs_from_dir;
my $categorizer = new LibCPV::Categorizer
({
document_set => $doc_set,
learner_rootdir => '/path/to/learner/output/dir'
});
$categorizer->train;
We use AI::Categorizer. Because AI::Categorizer does not do hierarchical categorization we added our own hierarchy schema based on the semantics of "cpv numbers".
For introduction to cpv numbers see http://simap.eu.int/EN/pub/src/welcome.htm.
In LibCPV::Categorizer we try to use a consistent wording. Here are the most important phrases:
learner - An AI::Categorizer instance used to learn (or train).
category - a cpv number, simply an 8-digit-number. CPV numbers are hierarchically built. The first 2 digits form a common level of accuracy, then each following digit forms another accuracy level. We derive the word "group" from that accuracy level definition.
Exercise some Affe dance.
Quite funky Zomtec
Bla.
Fasel.
Bummer!
Kram.
# a verbatim block
sub cut { 42 }
my $foo = cut();
sub affe {
do_something_strong($foo, $zomtec, @tiger);
print STDERR $foo, "\n";
}
# another verbatim block after a single empty line
# although that is not the only reason for confusion
affe();
sub kram {
foo($kram);
}
If all possible cpv numbers with 8 digits would be used, the tree
would have one root level learner categorizing into 99 categories,
99 learners at the next level each categorizing into 9 categories,
therefore 9 learnes in each of the 99 categories, and in each
following level 9 more learners for each category.
Hey! The above document had some coding errors, which are explained below:
- Around line 43:
-
Unknown directive: =func
- Around line 47:
-
Unknown directive: =method
- Around line 51:
-
Unknown directive: =method
- Around line 55:
-
Unknown directive: =func
- Around line 59:
-
Unknown directive: =attr