e-Translation: HTML not translated #169

pheyvaer · 2016-02-25T09:34:12Z

When I want to translate a HTML document in EN to DE. I get the original HTML back, without a translation.

I used the HTML file in SB50.zip.

jnehring · 2016-02-25T12:41:14Z

Roundtripping in e-Translation is currently not supported. http://api.freme-project.eu/doc/0.5/knowledge-base/eInternationalization.html (section Translation of HTML with round-tripping)

I reproduced the issue using this request:

http://api.freme-project.eu/0.5/e-translation/tilde?source-lang=en&target-lang=de&informat=text/html&input=<p>Hello World</p>&outformat=text/html

philr-vistatec · 2016-07-08T20:28:48Z

Myself, Felix and Katia discussed this during the Lisbon face-to-face.

We do not think that e-translation needs to support HTML tags natively, we believe a solution can be provided by e-internationalization but support IS needed from e-translation by writing extra sub-segments into NIF output. These sub-segments for source and target are related to each other in the NIF using the itsrdf:target property.

The process would be:

e-internationalization creates NIF for translation (without markup/plain text) AND NIF skeleton
e-translation generates translation for plain text sent from step 1. and includes related source/target sub-segments
e-internationalization uses generated NIF from step 2 and NIF skeleton from step 1 to re-apply markup using an algorithm similar to this: https://www.mediawiki.org/wiki/Content_translation/Developers/Markup

Example: translate English "this is a small house" TO German "das ist ein kleines haus"

e-internationalization would send plain text to e-translation. e-translate would return following NIF:

@Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@Prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@Prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@Prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@Prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,21>
a nif:String , nif:RFC5147String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "21"^^xsd:nonNegativeInteger ;
nif:isString "this is a small house"@en ;
itsrdf:target "das ist ein kleines Haus"@de .

// English
<http://prefix.given.by/theClient#char=0,7> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "this is" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=0,7> .

<http://prefix.given.by/theClient#char=9,10> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "a" ;
nif:beginIndex "9" ;
nif:endIndex "10" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=9,10> .

<http://prefix.given.by/theClient#char=10,15> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "small" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=12,19> .

<http://prefix.given.by/theClient#char=16,21> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "house" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=20,24> .

// German translation
<http://prefix.given.by/theClient/de#char=0,7> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "das ist" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=9,10> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "ein" ;
nif:beginIndex "9" ;
nif:endIndex "10" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=12,19> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "kleines" ;
nif:beginIndex "12" ;
nif:endIndex "19" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=20,24> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "haus" ;
nif:beginIndex "20" ;
nif:endIndex "24" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

andish · 2016-08-08T14:10:28Z

I don't like this approach describing segmentation information, it is not robust:

This NIF output example describes 3 documents -http://freme-project.eu/ , http://prefix.given.by/theClient/de and http://prefix.given.by/theClient/en . The last two documents do not exist and are 'virtual', the first is real document, submitted by the end-user.
This will bring us in problems when the result will be submitted for any other e-service [pipelining] - 3 different documents in two languages.
NIF can contain multiple strings for translation - so each string will generate 2 new 'virtual' documents
What happens if we submit the result again for e-translation, but for different language? Currently it adds one more translation for each string, but in this case the document count will multiply.
there will be difficult to write SPARQL queries because selected data will depend on URI string. There is no semantic relation between document submitted by the client (http://freme-project.eu/ and http://prefix.given.by/theClient/en)

I have an idea of using NIF AnnotationUnit as we are implementing NIF 2.1 version.

fsasaki · 2016-08-08T14:38:08Z

Could you give an example of the NIF 2.1 based idea?

andish · 2016-08-09T06:22:35Z

This is graphical representation of idea. Segmentation of source text is done like any other e-service do right now. And segment translations are represented as annotationUnit. For demo also added two annotation units to show how it merges together with other e-services:

I will try to create also NIF/RDF example.

andish · 2016-08-09T06:52:09Z

Input NIF example could look like this:

@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://freme-project.eu/#offset_0_105>
        a               nif:Context , nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "105"^^xsd:nonNegativeInteger ;     
        nif:predLang isolang:eng ;
        nif:isString    "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string.

Output NIF example could look like this:

@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://freme-project.eu/#offset_0_105>
        a               nif:Context , nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "105"^^xsd:nonNegativeInteger ;     
        nif:predLang isolang:eng ;
        nif:isString    "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string;
        itsrd:target "Nur für die Prüfung\nDies ist nur eine Prüfung\nEs gibt andere Strafen umgewandelt werden,\nHeute ist."@de.

<http://freme-project.eu/#offset_0_15>
        a               nif:Phrase, nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "15"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        itsrdf:translate "no";
        nif:anchorOf    "\nJust for test\n"^^xsd:string .                                            

<http://freme-project.eu/#offset_1_14>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex     "1"^^xsd:nonNegativeInteger ;
        nif:endIndex        "14"^^xsd:nonNegativeInteger ;
        nif:anchorOf        "Just for test"^^xsd:string;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        itsrdf:target         "Nur für die Prüfung"@de;
        dc:identifier         "1" .                               

<http://freme-project.eu/#offset_1_5>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex        "1"^^xsd:nonNegativeInteger ;
        nif:endIndex          "5"^^xsd:nonNegativeInteger ;
        nif:anchorOf         "Just"^^xsd:string ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        nif:annotationUnit _:annotationUnit1.                               

<http://freme-project.eu/#offset_6_14>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex        "6"^^xsd:nonNegativeInteger ;
        nif:endIndex          "14"^^xsd:nonNegativeInteger ;
        nif:anchorOf           "for test"^^xsd:string;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ; 
        nif:annotationUnit _:annotationUnit2.                              

_:annotationUnit1
                itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
                itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
                itsrdf:target "Nur"@de.                              

_:annotationUnit2
                itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
                itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
                itsrdf:target "für die Prüfung"@de.

fsasaki · 2016-08-11T08:31:28Z

A question raised by @ankitks : with the translation segment output

_:annotationUnit1
nif-ann:taIdentProv freme-api:tilde.com/mt/systems/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9;
itsrdf:target "das ist"@de ;
itsrdf:taConfidence 0.9898.

it seems that there is no relation to the complete translation output. That is, one should add to the segmentation output the offsets related to the complete translation output, e.g.

_:annotationUnit1
nif-ann:taIdentProv freme-api:tilde.com/mt/systems/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9;
nif:beginIndex "0";
nif:endIndex "7";
itsrdf:target "das ist"@de ;
itsrdf:taConfidence 0.9898.

it seems that without the begin and end index information in the translation, one cannot do the re-insertion of the markup?

m1ci · 2016-08-11T08:45:27Z

the beginIndex and endIndex can be retrieved from the corresponding nif:String and thats:

http://freme-project.eu#offset_0_7 rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "this is" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext http://freme-project.eu#offset_0_21 ;
nif-ann:annotationUnit _:annotationUnit1.

Note that the annotation unit is linked to the string.

katia-vistatec · 2016-08-11T08:46:05Z

I don't think so. I think that the position of the source text in the skeleton string should be known.

fsasaki · 2016-08-11T09:28:50Z

the beginIndex and endIndex can be retrieved from the corresponding nif:String ...

I don't think so: in my example at #169 (comment) , beginIndex and endIndex did not refer to the original content (= nif:String), but the translated content.

m1ci · 2016-08-11T09:34:56Z

if you want to describe the translation with beginIndex and endIndex, then the translation should be represented as own document, and linked to the source document

fsasaki · 2016-08-11T10:02:34Z

The main goal is to describe the relation between source language and target language segments, see @andish ' example at #169 (comment) . If this needs a separate document for the translation, that would be fine. But I am not sure how this would look like, e.g. how to extend / change @andish' example.

andish · 2016-08-11T10:08:04Z

Are you sure that you need to know beginIndex and endIndex of translated
text? I suppose you need only to know beginindex and endindex of source
text fragment and its corresponding translation.

Relation between source text fragment and original text is already in my
example as well as source segment relations with its corresponding
translation. So you can retrieve beginIndex, endIndex + source segment +
translation without problems.

2016-08-11 13:02 GMT+03:00 Felix Sasaki [email protected]:

The main goal is to describe the relation between source language and
target language segments, see @andish https://github.com/andish '
example at #169 (comment)
#169 (comment)
. If this needs a separate document for the translation, that would be
fine. But I am not sure how this would look like, e.g. how to extend /
change @andish https://github.com/andish' example.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#169 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKyoRFQtk5KhnZaXPzx9oe2GicJ0ifEiks5qevM7gaJpZM4HikkE
.

katia-vistatec · 2016-08-11T10:18:25Z

I don't think it is necessary to have begin Index and end Index of translated text. Once having the skeleton representing the whole html document, it is sufficient to localize the source texts and then replace them with the their translations which are known with enriched nif.

fsasaki · 2016-08-11T11:14:09Z

I don't think it is necessary to have begin Index and end Index of translated text.

sounds good 👍

jnehring · 2016-08-31T07:51:47Z

I tried to translate inline tag:

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: d6ec9ab8-f988-f723-b224-b915a426ef74" -d '<p>Today <strong>I</strong> feel good</p>' "http://api-dev.freme-project.eu/current/e-translation/tilde?informat=text/html&outformat=text/html&source-lang=en&target-lang=de&nif-version=2.0"

Sends this text to mt: <p>Today <strong>I</strong> feel good</p>.

The service returns

<p>Today 
    <strong>I</strong> feel good
</p>

When I set outformat=turtle I can see that the translation is in German.

katia-vistatec · 2016-08-31T09:10:29Z

Sorry but I don't understand. I don't think roundtripping after translation were implemented. For me the translation has always worked only with output format turtle. For this reason I am waiting for the changes to NIF 2.1 to be implemented.

pheyvaer mentioned this issue Feb 25, 2016

Write tutorial: Sample applications ePub freme-project/freme-project.github.io#111

Open

andish mentioned this issue Aug 8, 2016

Adding NIF 2.1 support freme-project/technical-discussion#121

Closed

jnehring assigned katia-vistatec Aug 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e-Translation: HTML not translated #169

e-Translation: HTML not translated #169

pheyvaer commented Feb 25, 2016

jnehring commented Feb 25, 2016

philr-vistatec commented Jul 8, 2016

andish commented Aug 8, 2016 •

edited

Loading

fsasaki commented Aug 8, 2016 via email

andish commented Aug 9, 2016 •

edited

Loading

andish commented Aug 9, 2016 •

edited

Loading

fsasaki commented Aug 11, 2016

m1ci commented Aug 11, 2016

katia-vistatec commented Aug 11, 2016

fsasaki commented Aug 11, 2016

m1ci commented Aug 11, 2016

fsasaki commented Aug 11, 2016

andish commented Aug 11, 2016

katia-vistatec commented Aug 11, 2016 •

edited

Loading

fsasaki commented Aug 11, 2016

jnehring commented Aug 31, 2016

katia-vistatec commented Aug 31, 2016 •

edited

Loading

e-Translation: HTML not translated #169

e-Translation: HTML not translated #169

Comments

pheyvaer commented Feb 25, 2016

jnehring commented Feb 25, 2016

philr-vistatec commented Jul 8, 2016

andish commented Aug 8, 2016 • edited Loading

fsasaki commented Aug 8, 2016 via email

andish commented Aug 9, 2016 • edited Loading

andish commented Aug 9, 2016 • edited Loading

fsasaki commented Aug 11, 2016

m1ci commented Aug 11, 2016

katia-vistatec commented Aug 11, 2016

fsasaki commented Aug 11, 2016

m1ci commented Aug 11, 2016

fsasaki commented Aug 11, 2016

andish commented Aug 11, 2016

katia-vistatec commented Aug 11, 2016 • edited Loading

fsasaki commented Aug 11, 2016

jnehring commented Aug 31, 2016

katia-vistatec commented Aug 31, 2016 • edited Loading

andish commented Aug 8, 2016 •

edited

Loading

andish commented Aug 9, 2016 •

edited

Loading

andish commented Aug 9, 2016 •

edited

Loading

katia-vistatec commented Aug 11, 2016 •

edited

Loading

katia-vistatec commented Aug 31, 2016 •

edited

Loading