Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Html to Docx : Empty Paragraphs #79

Open
NcIgor opened this issue Jan 10, 2022 · 1 comment
Open

Convert Html to Docx : Empty Paragraphs #79

NcIgor opened this issue Jan 10, 2022 · 1 comment

Comments

@NcIgor
Copy link

NcIgor commented Jan 10, 2022

When I run code to covert html to Doc (like in org.docx4j.samples.ConvertInXHTMLFile) I get a document with extra spaces and paragraphs
F.e., my html:

<!DOCTYPE html>
<html>
<head>
    <style>
        i {
            color: red;
            background-color: gray;
        }
    </style>
</head>
<body>
<div>
    some text
    <span>new text</span>
</div>
</body>
</html>

Document:
image

Source code:

    public static void main(String[] args) throws Exception {
//        org.docx4j.samples.ConvertInXHTMLFile
        String baseURL = null;
        String stringFromFile = getContent();
        /*RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
        rfonts.setAscii("Century Gothic");
        XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);*/
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();
        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        XHTMLImporter.setHyperlinkStyle("Hyperlink");
        List<Object> convert = XHTMLImporter.convert(stringFromFile, baseURL);
        wordMLPackage.getMainDocumentPart().getContent().addAll(convert);
        System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));
        wordMLPackage.save(new File("docs/a.docx"));
    }
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-ImportXHTML</artifactId>
            <version>8.3.2</version>
        </dependency>
@plutext
Copy link
Owner

plutext commented Feb 3, 2022

What does your getContent() do?

Can't reproduce, using ConvertInXHTMLFile sample code, which uses:

        String stringFromFile = FileUtils.readFileToString(new File(inputfilepath), "UTF-8");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants