Convert Html to Docx : Empty Paragraphs #79

NcIgor · 2022-01-10T16:36:46Z

When I run code to covert html to Doc (like in org.docx4j.samples.ConvertInXHTMLFile) I get a document with extra spaces and paragraphs
F.e., my html:

<!DOCTYPE html>
<html>
<head>
    <style>
        i {
            color: red;
            background-color: gray;
        }
    </style>
</head>
<body>
<div>
    some text
    <span>new text</span>
</div>
</body>
</html>

Document:

Source code:

    public static void main(String[] args) throws Exception {
//        org.docx4j.samples.ConvertInXHTMLFile
        String baseURL = null;
        String stringFromFile = getContent();
        /*RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
        rfonts.setAscii("Century Gothic");
        XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);*/
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();
        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        XHTMLImporter.setHyperlinkStyle("Hyperlink");
        List<Object> convert = XHTMLImporter.convert(stringFromFile, baseURL);
        wordMLPackage.getMainDocumentPart().getContent().addAll(convert);
        System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));
        wordMLPackage.save(new File("docs/a.docx"));
    }

        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-ImportXHTML</artifactId>
            <version>8.3.2</version>
        </dependency>

The text was updated successfully, but these errors were encountered:

plutext · 2022-02-03T23:45:44Z

What does your getContent() do?

Can't reproduce, using ConvertInXHTMLFile sample code, which uses:

        String stringFromFile = FileUtils.readFileToString(new File(inputfilepath), "UTF-8");

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert Html to Docx : Empty Paragraphs #79

Convert Html to Docx : Empty Paragraphs #79

NcIgor commented Jan 10, 2022 •

edited

Loading

plutext commented Feb 3, 2022

Convert Html to Docx : Empty Paragraphs #79

Convert Html to Docx : Empty Paragraphs #79

Comments

NcIgor commented Jan 10, 2022 • edited Loading

plutext commented Feb 3, 2022

NcIgor commented Jan 10, 2022 •

edited

Loading