-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multi-lingual candidate names #138
Comments
So on the protocol level, would this mean to allow arrays of objects for candidate "candidates": [
{
"id": "K11-15",
"name": [
{
"str": "National Arts Centre - Azrieli Studio",
"lang": "en"
},
{
"str": "Centre National des Arts - Studio Azrieli",
"lang": "fr"
}
],
...
}
] |
If we go down that route, I wonder if we should also add support for that for multiple names for properties (when returned in a property suggest response, or in a data extension response) or for types (when returned in a type suggest response, or in a reconciliation response as part of the reconciliation candidates). I guess it would make things look more uniform but I am not really sure about the use case. What do you think @saumier? |
Yes. Since the group is not recommending JSON-LD, then I think this is the next best approach. I am implementing a bilingual website (en, fr) that implements a client for the reconciliation API here kg.artsdata.ca. The UI of this site can switch between English and French. When querying using the reconciliation API, a query string can be in any language. For example I could query a Place using "Studio Azrieli" and "Azrieli Studio". The response would return candidates including K11-15. With this new approach, the website could display the name and description in the UI language. Also good for add support for property and type suggestions. |
Summary of our discussion on the monthly call of last month: we could either
Maybe there are other options? We thought that it is worth bringing more attention to this issue from the broader community, to gather more feedback. |
Unless the variable structure is backward compatible when the simple variant is used, I think it's better to be consistent and always use the array form, even for a single entry. I suspect that things have diverged enough that there's not a compatibility benefit. |
I second @tfmorris opinion. I like the consistency of when our API standards have a context that could be "one or many" then we resort to Array form. (mostly because the idea of simpler JSON structure, is precluded that perhaps JSON Array objects are complicated or noisy?, when they really are not for developers and our 2024+ tooling nowadays) |
Generally, this seems to be related to #52 as a solution to this issue will also resolve the #52, won't it?
I am late to the party (sorry) but am adding this for reference. Generally, I like the "language map" approach from JSON-LD (examples) for providing labels in multiple languages as it is simple, terse and easy to read. The example from #138 (comment) would look like this with language maps: {
"candidates":[
{
"id":"K11-15",
"name":{
"en":"National Arts Centre - Azrieli Studio",
"fr":"Centre National des Arts - Studio Azrieli"
}
}
]
} |
@acka47 If we went that route, we'd have to adopt a convention and document it. That being the key should be an ISO 639-3 three letter code? Hmm, what else? |
@acka47 I like the conciseness but how would a service represent a name or description for which it does not know the language? (Use case: a tool like CSV-reconcile, which spins a reconciliation service on arbitrary datasets, generally will not have access to this sort of information and shouldn't make up a language for the sake of fitting in) |
Yes, we could define it similar to JSON-LD like this: "keys must be strings representing [BCP47] language codes and the values must be a string."
Good question. I guess for the other approach from #138 (comment) you would you just omit the optional |
Would the array approach allow for multiple alias names in the same language whereas the map approach would not? That could be an argument for choosing the array approach. On the other hand, I am not sure we actually want to allow this? |
Another aspect to consider for the
|
To be clear, this is not only about array vs. non-array, but also object vs. string. The common, simple case currently: "name": "National Arts Centre - Azrieli Studio" The common case in the unified syntax: "name": [
{
"str": "National Arts Centre - Azrieli Studio"
}
] If this was the first and only place where we introduce optional structure (string or array of objects), I'd agree we might want to avoid that. But since we do the same thing in other places (e.g. property values), I feel like the much simpler common case is worth having the option. |
From JSON-LD https://www.w3.org/TR/json-ld/#example-102-indexing-languaged-tagged-strings-using-none-for-no-language
Example if there was no language for a name.
|
I'm not really enthusiastic about any of the solutions, but the one that I find the least bad is @fsteeg's suggestion to use the existing language (+ text direction) mechanisms we have, and simply switch to this default syntax: "name": [
{
"str": "National Arts Centre - Azrieli Studio"
}
] with the option to add a |
And I agree with @tfmorris on the preference to stick to the array form. |
I have no preference here but just felt that the language map approach should at least be discussed in this context. Thus, I am fine with an array of objects containing at least the |
@wetneb My team has implemented an endpoint for the current draft spec and updated our branch of the test bench to support both v0.2 and v0.3 (draft). Here are 2 screen grabs from our branch of test bench. One showing our production reconciliation endpoint v0.2 and a second screen grab showing our test reconciliation endpoint v0.3 with multi-lingual support meeting the needs of this use case. This is a work in progress. v0.2 - current spec - showing Azieli Studio returned 2 times with the same ID K11-15v0.3 - draft spec - showing Azrieli Studio entity combined in a single response with en and fr. |
With required `str`, optional `lang` and `dir` fields
- Extract existing string object definition to its own schema file - Reference string-object.json in the suggest response schemas - Update spec & examples to use string objects in suggest responses - Redefine types used in suggest response as described in the spec (instead of referencing the actual type.json schema) - Clarify in the spec that we don't return actual full entity, property, or type objects in the suggest response's `result` field
Also add `description` to spec and example (was in schema already)
Quoting myself in in the related PR #176 (comment):
I feel like we, in particular myself in #138 (comment), might have jumped to the solution of changing the data structure too quickly. One approach we discussed in today's meeting is using multiple requests, one for each language, each returning the current, simple structure. So instead of a single response in the new format: "candidates": [
{
"id": "K11-15",
"name": [
{
"str": "National Arts Centre - Azrieli Studio",
"lang": "en"
},
{
"str": "Centre National des Arts - Studio Azrieli",
"lang": "fr"
}
],
...
}
] We'd have two responses (for two requests with different "candidates": [
{
"id": "K11-15",
"name": "National Arts Centre - Azrieli Studio",
"lang": "en"
...
}
] "candidates": [
{
"id": "K11-15",
"name": "Centre National des Arts - Studio Azrieli",
"lang": "fr"
...
}
] This seems way more lightweight and in line with the other internationalization support, which is completely optional (request and response headers, optional It's actually kind of close to the original workaround of returning multiple candidates with the same ID but different labels by @saumier in #138 (comment), but I guess in all cases the client will have to handle something (grouping candidates with the same ID or displaying the new structure). So not sure how that would be implemented exactly, but wanted to ask for feedback on the basic idea. |
Using multiple queries seems inefficient to me. I think doing it the way the Google KG Search does with an ordered list of requested languages would be simpler:
which then returns the results in the same order as specified by the request: "@id": "kg:/m/02vk6kk",
"name": [
{
"@language": "fr",
"@value": "Étage"
},
{
"@value": "Storey",
"@language": "en"
}
], |
@fsteeg I am also coming around to the idea that we maybe changed the data structure too quickly. In my specific use case, the implementation of the reconciliation service is such that it always processes "matchType": "name" requests in both languages (ignoring the The root of my problem is how to return the response that the user is expecting to see in the UI. Especially when there is an exact
@fsteeg I understand your idea, but instead of doing two requests with different |
Isn't that mainly a client / UI issue? You could return both candidates, no matter which language(s) were requested: "candidates": [
{
"id": "K11-15",
"name": "National Arts Centre - Azrieli Studio",
"lang": "en"
...
},
{
"id": "K11-15",
"name": "Centre National des Arts - Studio Azrieli",
"lang": "fr"
...
}
] As mentioned above, it's up to the client to display multi-language candidates properly (no matter which approach we take), e.g. by grouping these candidates by ID, or by language, or by simply adding the language as a field in the UI. |
As a service provider, I would like clients to be able to query in any language and to return candidate names in one or more languages specified by the client request.
Use Case
A client is reconciling a place in Canada using the Artsdata.ca Reconciliation service with the name "Studio Azrieli".
Current solution (not ideal)
The service returns multiple entities including K11-15 "National Arts Centre - Azrieli Studio" and K11-15 "Centre National des Arts - Studio Azrieli" which appear as separate entities but have the same URI. This may appear incorrect to the user because there are 2 candidates. If the user doesn't notice that they have the same URI then they may be mistaken as duplicates.
Ideal solution
The service returns multiple entities but only a single K11-15 displaying both names "National Arts Centre - Azrieli Studio" and "Centre National des Arts - Studio Azrieli" together. Parameters can specify the languages the client would like to display.
The text was updated successfully, but these errors were encountered: