Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e-Translation: HTML not translated #169

Open
pheyvaer opened this issue Feb 25, 2016 · 17 comments
Open

e-Translation: HTML not translated #169

pheyvaer opened this issue Feb 25, 2016 · 17 comments
Assignees

Comments

@pheyvaer
Copy link
Contributor

When I want to translate a HTML document in EN to DE. I get the original HTML back, without a translation.

I used the HTML file in SB50.zip.

@jnehring
Copy link
Member

Roundtripping in e-Translation is currently not supported. http://api.freme-project.eu/doc/0.5/knowledge-base/eInternationalization.html (section Translation of HTML with round-tripping)

I reproduced the issue using this request:

http://api.freme-project.eu/0.5/e-translation/tilde?source-lang=en&target-lang=de&informat=text/html&input=<p>Hello World</p>&outformat=text/html

@philr-vistatec
Copy link

Myself, Felix and Katia discussed this during the Lisbon face-to-face.

We do not think that e-translation needs to support HTML tags natively, we believe a solution can be provided by e-internationalization but support IS needed from e-translation by writing extra sub-segments into NIF output. These sub-segments for source and target are related to each other in the NIF using the itsrdf:target property.

The process would be:

  1. e-internationalization creates NIF for translation (without markup/plain text) AND NIF skeleton
  2. e-translation generates translation for plain text sent from step 1. and includes related source/target sub-segments
  3. e-internationalization uses generated NIF from step 2 and NIF skeleton from step 1 to re-apply markup using an algorithm similar to this: https://www.mediawiki.org/wiki/Content_translation/Developers/Markup

Example: translate English "this is a small house" TO German "das ist ein kleines haus"

e-internationalization would send plain text to e-translation. e-translate would return following NIF:

@Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@Prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@Prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@Prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@Prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,21>
a nif:String , nif:RFC5147String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "21"^^xsd:nonNegativeInteger ;
nif:isString "this is a small house"@en ;
itsrdf:target "das ist ein kleines Haus"@de .

// English
<http://prefix.given.by/theClient#char=0,7> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "this is" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=0,7> .

<http://prefix.given.by/theClient#char=9,10> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "a" ;
nif:beginIndex "9" ;
nif:endIndex "10" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=9,10> .

<http://prefix.given.by/theClient#char=10,15> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "small" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=12,19> .

<http://prefix.given.by/theClient#char=16,21> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "house" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient#char=0,21> ;
itsrdf:target <http://prefix.given.by/theClient/de#char=20,24> .

// German translation
<http://prefix.given.by/theClient/de#char=0,7> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "das ist" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=9,10> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "ein" ;
nif:beginIndex "9" ;
nif:endIndex "10" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=12,19> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "kleines" ;
nif:beginIndex "12" ;
nif:endIndex "19" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

<http://prefix.given.by/theClient/de#char=20,24> rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "haus" ;
nif:beginIndex "20" ;
nif:endIndex "24" ;
nif:referenceContext <http://prefix.given.by/theClient/de#char=0,24> .

@andish
Copy link
Contributor

andish commented Aug 8, 2016

I don't like this approach describing segmentation information, it is not robust:

  1. This NIF output example describes 3 documents -http://freme-project.eu/ , http://prefix.given.by/theClient/de and http://prefix.given.by/theClient/en . The last two documents do not exist and are 'virtual', the first is real document, submitted by the end-user.
  2. This will bring us in problems when the result will be submitted for any other e-service [pipelining] - 3 different documents in two languages.
  3. NIF can contain multiple strings for translation - so each string will generate 2 new 'virtual' documents
  4. What happens if we submit the result again for e-translation, but for different language? Currently it adds one more translation for each string, but in this case the document count will multiply.
  5. there will be difficult to write SPARQL queries because selected data will depend on URI string. There is no semantic relation between document submitted by the client (http://freme-project.eu/ and http://prefix.given.by/theClient/en)

I have an idea of using NIF AnnotationUnit as we are implementing NIF 2.1 version.

@fsasaki
Copy link

fsasaki commented Aug 8, 2016 via email

@andish
Copy link
Contributor

andish commented Aug 9, 2016

This is graphical representation of idea. Segmentation of source text is done like any other e-service do right now. And segment translations are represented as annotationUnit. For demo also added two annotation units to show how it merges together with other e-services:
image

I will try to create also NIF/RDF example.

@andish
Copy link
Contributor

andish commented Aug 9, 2016

Input NIF example could look like this:

@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://freme-project.eu/#offset_0_105>
        a               nif:Context , nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "105"^^xsd:nonNegativeInteger ;     
        nif:predLang isolang:eng ;
        nif:isString    "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string.

Output NIF example could look like this:

@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix isolang: <http://www.lexvo.org/id/iso639-3/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://freme-project.eu/#offset_0_105>
        a               nif:Context , nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "105"^^xsd:nonNegativeInteger ;     
        nif:predLang isolang:eng ;
        nif:isString    "\nJust for test\n \nThis is just a test\n \nThere are different sentences to be translated\n \nToday is Monday.\n"^^xsd:string;
        itsrd:target "Nur für die Prüfung\nDies ist nur eine Prüfung\nEs gibt andere Strafen umgewandelt werden,\nHeute ist."@de.

<http://freme-project.eu/#offset_0_15>
        a               nif:Phrase, nif:OffsetBasedString;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "15"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        itsrdf:translate "no";
        nif:anchorOf    "\nJust for test\n"^^xsd:string .                                            

<http://freme-project.eu/#offset_1_14>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex     "1"^^xsd:nonNegativeInteger ;
        nif:endIndex        "14"^^xsd:nonNegativeInteger ;
        nif:anchorOf        "Just for test"^^xsd:string;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        itsrdf:target         "Nur für die Prüfung"@de;
        dc:identifier         "1" .                               

<http://freme-project.eu/#offset_1_5>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex        "1"^^xsd:nonNegativeInteger ;
        nif:endIndex          "5"^^xsd:nonNegativeInteger ;
        nif:anchorOf         "Just"^^xsd:string ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ;
        nif:annotationUnit _:annotationUnit1.                               

<http://freme-project.eu/#offset_6_14>
        a                     nif:Phrase ,   nif:OffsetBasedString ;
        nif:beginIndex        "6"^^xsd:nonNegativeInteger ;
        nif:endIndex          "14"^^xsd:nonNegativeInteger ;
        nif:anchorOf           "for test"^^xsd:string;
        nif:referenceContext  <http://freme-project.eu/#offset_0_105> ; 
        nif:annotationUnit _:annotationUnit2.                              

_:annotationUnit1
                itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
                itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
                itsrdf:target "Nur"@de.                              

_:annotationUnit2
                itsrdf:mtConfidence"0.9869992701528016"^^xsd:double;
                itsrdf:taAnnotatorsRef <https://services.tilde.com/systems/translation/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9>;
                itsrdf:target "für die Prüfung"@de. 

@fsasaki
Copy link

fsasaki commented Aug 11, 2016

A question raised by @ankitks : with the translation segment output

_:annotationUnit1
nif-ann:taIdentProv freme-api:tilde.com/mt/systems/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9;
itsrdf:target "das ist"@de ;
itsrdf:taConfidence 0.9898.

it seems that there is no relation to the complete translation output. That is, one should add to the segmentation output the offsets related to the complete translation output, e.g.

_:annotationUnit1
nif-ann:taIdentProv freme-api:tilde.com/mt/systems/smt-0eba8517-0cf8-43b9-8dc2-ed99b33e56b9;
nif:beginIndex "0";
nif:endIndex "7";
itsrdf:target "das ist"@de ;
itsrdf:taConfidence 0.9898.

it seems that without the begin and end index information in the translation, one cannot do the re-insertion of the markup?

@m1ci
Copy link
Contributor

m1ci commented Aug 11, 2016

the beginIndex and endIndex can be retrieved from the corresponding nif:String and thats:

http://freme-project.eu#offset_0_7 rdf:type nif:Phrase ,
nif:RFC5147String ;
nif:anchorOf "this is" ;
nif:beginIndex "0" ;
nif:endIndex "7" ;
nif:referenceContext http://freme-project.eu#offset_0_21 ;
nif-ann:annotationUnit _:annotationUnit1.

Note that the annotation unit is linked to the string.

@katia-vistatec
Copy link

I don't think so. I think that the position of the source text in the skeleton string should be known.

@fsasaki
Copy link

fsasaki commented Aug 11, 2016

the beginIndex and endIndex can be retrieved from the corresponding nif:String ...

I don't think so: in my example at #169 (comment) , beginIndex and endIndex did not refer to the original content (= nif:String), but the translated content.

@m1ci
Copy link
Contributor

m1ci commented Aug 11, 2016

if you want to describe the translation with beginIndex and endIndex, then the translation should be represented as own document, and linked to the source document

@fsasaki
Copy link

fsasaki commented Aug 11, 2016

The main goal is to describe the relation between source language and target language segments, see @andish ' example at #169 (comment) . If this needs a separate document for the translation, that would be fine. But I am not sure how this would look like, e.g. how to extend / change @andish' example.

@andish
Copy link
Contributor

andish commented Aug 11, 2016

Are you sure that you need to know beginIndex and endIndex of translated
text? I suppose you need only to know beginindex and endindex of source
text fragment and its corresponding translation.

Relation between source text fragment and original text is already in my
example as well as source segment relations with its corresponding
translation. So you can retrieve beginIndex, endIndex + source segment +
translation without problems.

2016-08-11 13:02 GMT+03:00 Felix Sasaki [email protected]:

The main goal is to describe the relation between source language and
target language segments, see @andish https://github.com/andish '
example at #169 (comment)
#169 (comment)
. If this needs a separate document for the translation, that would be
fine. But I am not sure how this would look like, e.g. how to extend /
change @andish https://github.com/andish' example.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#169 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKyoRFQtk5KhnZaXPzx9oe2GicJ0ifEiks5qevM7gaJpZM4HikkE
.

@katia-vistatec
Copy link

katia-vistatec commented Aug 11, 2016

I don't think it is necessary to have begin Index and end Index of translated text. Once having the skeleton representing the whole html document, it is sufficient to localize the source texts and then replace them with the their translations which are known with enriched nif.

@fsasaki
Copy link

fsasaki commented Aug 11, 2016

I don't think it is necessary to have begin Index and end Index of translated text.

sounds good 👍

@jnehring
Copy link
Member

I tried to translate inline tag:

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: d6ec9ab8-f988-f723-b224-b915a426ef74" -d '<p>Today <strong>I</strong> feel good</p>' "http://api-dev.freme-project.eu/current/e-translation/tilde?informat=text/html&outformat=text/html&source-lang=en&target-lang=de&nif-version=2.0"

Sends this text to mt: <p>Today <strong>I</strong> feel good</p>.

The service returns

<p>Today 
    <strong>I</strong> feel good
</p>

When I set outformat=turtle I can see that the translation is in German.

@katia-vistatec
Copy link

katia-vistatec commented Aug 31, 2016

Sorry but I don't understand. I don't think roundtripping after translation were implemented. For me the translation has always worked only with output format turtle. For this reason I am waiting for the changes to NIF 2.1 to be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants