-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e-Translation: HTML not translated #169
Comments
Roundtripping in e-Translation is currently not supported. http://api.freme-project.eu/doc/0.5/knowledge-base/eInternationalization.html (section Translation of HTML with round-tripping) I reproduced the issue using this request:
|
Myself, Felix and Katia discussed this during the Lisbon face-to-face. We do not think that e-translation needs to support HTML tags natively, we believe a solution can be provided by e-internationalization but support IS needed from e-translation by writing extra sub-segments into NIF output. These sub-segments for source and target are related to each other in the NIF using the itsrdf:target property. The process would be:
Example: translate English "this is a small house" TO German "das ist ein kleines haus" e-internationalization would send plain text to e-translation. e-translate would return following NIF: @Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://freme-project.eu/#char=0,21> // English <http://prefix.given.by/theClient#char=9,10> rdf:type nif:Phrase , <http://prefix.given.by/theClient#char=10,15> rdf:type nif:Phrase , <http://prefix.given.by/theClient#char=16,21> rdf:type nif:Phrase , // German translation <http://prefix.given.by/theClient/de#char=9,10> rdf:type nif:Phrase , <http://prefix.given.by/theClient/de#char=12,19> rdf:type nif:Phrase , <http://prefix.given.by/theClient/de#char=20,24> rdf:type nif:Phrase , |
I don't like this approach describing segmentation information, it is not robust:
I have an idea of using NIF AnnotationUnit as we are implementing NIF 2.1 version. |
Could you give an example of the NIF 2.1 based idea?
|
Input NIF example could look like this: @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://freme-project.eu/#offset_0_105>
a nif:Context , nif:OffsetBasedString;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "105"^^xsd:nonNegativeInteger ;
nif:predLang isolang:eng ;
nif:isString "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string. Output NIF example could look like this: @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://freme-project.eu/#offset_0_105>
a nif:Context , nif:OffsetBasedString;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "105"^^xsd:nonNegativeInteger ;
nif:predLang isolang:eng ;
nif:isString "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string;
itsrd:target "Nur für die Prüfung\nDies ist nur eine Prüfung\nEs gibt andere Strafen umgewandelt werden,\nHeute ist."@de.
<http://freme-project.eu/#offset_0_15>
a nif:Phrase, nif:OffsetBasedString;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "15"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/#offset_0_105> ;
itsrdf:translate "no";
nif:anchorOf "\nJust for test\n"^^xsd:string .
<http://freme-project.eu/#offset_1_14>
a nif:Phrase , nif:OffsetBasedString ;
nif:beginIndex "1"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:anchorOf "Just for test"^^xsd:string;
nif:referenceContext <http://freme-project.eu/#offset_0_105> ;
itsrdf:target "Nur für die Prüfung"@de;
dc:identifier "1" .
<http://freme-project.eu/#offset_1_5>
a nif:Phrase , nif:OffsetBasedString ;
nif:beginIndex "1"^^xsd:nonNegativeInteger ;
nif:endIndex "5"^^xsd:nonNegativeInteger ;
nif:anchorOf "Just"^^xsd:string ;
nif:referenceContext <http://freme-project.eu/#offset_0_105> ;
nif:annotationUnit _:annotationUnit1.
<http://freme-project.eu/#offset_6_14>
a nif:Phrase , nif:OffsetBasedString ;
nif:beginIndex "6"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:anchorOf "for test"^^xsd:string;
nif:referenceContext <http://freme-project.eu/#offset_0_105> ;
nif:annotationUnit _:annotationUnit2.
_:annotationUnit1
itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
itsrdf:target "Nur"@de.
_:annotationUnit2
itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
itsrdf:target "für die Prüfung"@de. |
A question raised by @ankitks : with the translation segment output
it seems that there is no relation to the complete translation output. That is, one should add to the segmentation output the offsets related to the complete translation output, e.g.
it seems that without the begin and end index information in the translation, one cannot do the re-insertion of the markup? |
the beginIndex and endIndex can be retrieved from the corresponding nif:String and thats:
Note that the annotation unit is linked to the string. |
I don't think so. I think that the position of the source text in the skeleton string should be known. |
I don't think so: in my example at #169 (comment) , beginIndex and endIndex did not refer to the original content (= nif:String), but the translated content. |
if you want to describe the translation with beginIndex and endIndex, then the translation should be represented as own document, and linked to the source document |
The main goal is to describe the relation between source language and target language segments, see @andish ' example at #169 (comment) . If this needs a separate document for the translation, that would be fine. But I am not sure how this would look like, e.g. how to extend / change @andish' example. |
Are you sure that you need to know beginIndex and endIndex of translated Relation between source text fragment and original text is already in my 2016-08-11 13:02 GMT+03:00 Felix Sasaki [email protected]:
|
I don't think it is necessary to have begin Index and end Index of translated text. Once having the skeleton representing the whole html document, it is sufficient to localize the source texts and then replace them with the their translations which are known with enriched nif. |
sounds good 👍 |
I tried to translate inline tag:
Sends this text to mt: The service returns
When I set outformat=turtle I can see that the translation is in German. |
Sorry but I don't understand. I don't think roundtripping after translation were implemented. For me the translation has always worked only with output format turtle. For this reason I am waiting for the changes to NIF 2.1 to be implemented. |
When I want to translate a HTML document in EN to DE. I get the original HTML back, without a translation.
I used the HTML file in SB50.zip.
The text was updated successfully, but these errors were encountered: