Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Categories configurable #59

Open
pkiraly opened this issue Nov 18, 2020 · 7 comments
Open

Make the Categories configurable #59

pkiraly opened this issue Nov 18, 2020 · 7 comments

Comments

@pkiraly
Copy link
Owner

pkiraly commented Nov 18, 2020

@mielvds suggested the following: Would it be an idea to enable custon extentions of this list with arbitrary groups?

Right now category is an enumeration, and can not be configured.

@mielvds
Copy link
Contributor

mielvds commented Nov 18, 2020

yep, that'll work!

@pkiraly
Copy link
Owner Author

pkiraly commented Nov 18, 2020

@mielvds

I have two ideas, I would like to ask you your opinion.

Case 1: the categories on the field level is arbitrary, so you can add anything you want, there is no check at all.

Case 2: you should either add a "categories" list on schema level as well, which contains all possible values. It behaves as a controlled vocabulary, and the field level category MUST BE in this list.

Here is an example for Case 2:

format: json
fields:
  - name: edm:ProvidedCHO/@about
    path:  $.['providedCHOs'][0]['about']
    categories:
      - MANDATORY
  - name: Proxy/dc:title
    path: $.['proxies'][?(@['europeanaProxy'] == false)]['dcTitle']
    categories:
      - DESCRIPTIVENESS
      - SEARCHABILITY
      - IDENTIFICATION
      - MULTILINGUALITY
      - CUSTOM
  - name: Proxy/dcterms:alternative
    path: $.['proxies'][?(@['europeanaProxy'] == false)]['dctermsAlternative']
    categories:
      - DESCRIPTIVENESS
      - SEARCHABILITY
      - IDENTIFICATION
      - MULTILINGUALITY
groups:
  - fields:
      - Proxy/dc:title
      - Proxy/dc:description
    categories:
      - MANDATORY
categories:
  - MANDATORY
  - DESCRIPTIVENESS
  - SEARCHABILITY
  - IDENTIFICATION
  - MULTILINGUALITY
  - CUSTOM

Note the categories in the last section. It give the schema create a bit more work, but keep the consistency of the categories. If it is missing the default list which the tool will compare the field categories against will be the current enumeration.

Would you vote for case 1 or case 2?

@mielvds
Copy link
Contributor

mielvds commented Nov 18, 2020

I would say 1 because 2 won't add much in practice except for redundancy and less transparency. You can still implement the current situation if the categories on the field level are arbitrary. In fact, the schema doesn't even have to change.

Agreed, there is no check, but I think it's up to the writer of the schema to do it properly :) The worst that can happen is that the results are wrongly classified. Repeating the list on the schema level won't entirely avoid this mistake from happening.

But I don't have the full picture of course, is there's something I'm missing about the current list of categories?

@pkiraly
Copy link
Owner Author

pkiraly commented Nov 18, 2020

Thanks!

There is only one more thing I forget to mention. The final output (the order of columns in the output CSV or in the Java collection) could be sorted against this canonical list. Otherwise the order will be set on first come first served basis.

@pkiraly
Copy link
Owner Author

pkiraly commented Nov 18, 2020

I implemented it, but it requires some changes in the API. It is not anymore possible to add categories into JsonPath constructor, one has to use setCategories(List<String>) or setCategories(String...) or you can use the good old Category enum: setCategories(Category...).

Here is an example.

Old style

new JsonBranch("Proxy/dc:title", "$.['dcTitle']", 
    Category.DESCRIPTIVENESS,
    Category.SEARCHABILITY,
    Category.IDENTIFICATION,
    Category.MULTILINGUALITY);

new style:

new JsonBranch("Proxy/dc:title", "$.['dcTitle']")
    .setCategories(
      Category.DESCRIPTIVENESS,
      Category.SEARCHABILITY,
      Category.IDENTIFICATION,
      Category.MULTILINGUALITY
    );

One more thing: if the schema configuration has the categories property, the individual fields' categories are checked against that list, and the API filter out categories which are not listed.

@mielvds
Copy link
Contributor

mielvds commented Nov 19, 2020

Sounds good to me!

@mielvds
Copy link
Contributor

mielvds commented Jul 6, 2021

@pkiraly I think you can close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants