Stories

Detail Return Return

2. AI 輸出內容導出Word!docx4j+poi-tl 實現Markdown轉Word全流程 - Stories Detail

1.簡介

我們在上一章介紹瞭如果想實現將markdown內容轉換為word的話, 如果想要轉換後的word內容排版好看的話, 就需要將其轉換過程分為兩步

  1. markdownhtml
  2. htmlooxml(Office Open XML) word內容,word元信息本身就是個xml)

上一章節我們使用flexmarkmarkdown內容轉換為html內容, 完成了第一步, 本章節我們將介紹如何將html轉換為ooxml

2. 環境信息

為了兼容更多的場景, 所以並沒有用一些高版本的SDK, 信息如下

Java: 8
Docx4j: 8.3.10

3. Maven

<properties>
  <docx4j.version>8.3.10</docx4j.version>
  <jaxb2.version>1.11.1</jaxb2.version>
</properties>

<dependencies>
  <dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>${docx4j.version}</version>
  </dependency>
  <dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-ImportXHTML</artifactId>
    <version>${docx4j.version}</version>
  </dependency>
  <dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>${docx4j.version}</version>
  </dependency>
  <dependency>
    <groupId>org.jvnet.jaxb2_commons</groupId>
    <artifactId>jaxb2-basics</artifactId>
    <version>${jaxb2.version}</version>
  </dependency>
</dependencies>

4. Html轉Docx

import lombok.SneakyThrows;
import org.docx4j.Docx4J;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.Body;

import java.io.File;

/**
 * html 2 docx
 *
 * @author ludangxin
 * @since 2025/10/14
 */
public class HtmlToDocx {
    @SneakyThrows
    public static void convertHtmlToDocx(String htmlContent, String outputFilePath) {
        // 創建 Word 文檔包
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
        // 設置 XHTML 導入器
        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        // 將 HTML 內容導入到 Word 文檔中
        Body body = mainDocumentPart.getJaxbElement().getBody();
        body.getContent().addAll(XHTMLImporter.convert(htmlContent, null));
        // 保存 Word 文檔
        Docx4J.save(wordMLPackage, new File(outputFilePath), Docx4J.FLAG_NONE);
    }

    public static void main(String[] args) {
        String html = "<html><head></head><body><h2>嘉文四世</h2>\n" + "<blockquote>\n" + "<p>德瑪西亞</p>\n" + "</blockquote>\n" + "<p><strong>給我找些更強的敵人!</strong></p>\n" + "<table>\n" + "<thead>\n" + "<tr><th>列1</th><th>列2</th></tr>\n" + "</thead>\n" + "<tbody>\n" + "<tr><td>數據1</td><td>數據2</td></tr>\n" + "</tbody>\n" + "</table>\n" + "</body></html>";
        convertHtmlToDocx(html, "demo.docx");
    }
}

測試結果如下:

image-20251104094513921

在根目錄生成了docx文件, 文件內容如下

image-20251104095454439

生成的文檔內容是有樣式的, 只不過像字體樣式, 包括表格都是默認樣式

但如果項目上要求輸出的制式文檔或者是模板文件, 對文字標題甚至是表格都有其樣式要求的話, 那就得用一些高階用法了

5. 自定義樣式

如果想要自定義輸出的內容樣式, 其實就兩個思路:

  1. 從輸出的內容出發: 畢竟是html轉的ooxml, 那麼可以給html添加css樣式給docx4j進行渲染, 但前提是一些簡單的css樣式
  2. 從word出發: word文件本身就有內置樣式並且也可以自定義樣式, 所以可以先在模板文件中定義好樣式, 然後和輸入的內容進行映射

5.1 Html添加Css

比如給表格添加樣式, 讓表格有邊框並且有一定的樣式,css樣式如下:

table{border-collapse:collapse;border-spacing:0;width:100%;margin:1em 0;background-color:transparent;}table th{background-color:#f7f7f7;border:1px solid #ddd;padding:8px 12px;text-align:left}table td{border:1px solid #ddd;padding:8px 12px}
public static void main(String[] args) {
    String html = "<html><head><style>table{border-collapse:collapse;border-spacing:0;width:100%;margin:1em 0;background-color:transparent;}table th{background-color:#f7f7f7;border:1px solid #ddd;padding:8px 12px;text-align:left}table td{border:1px solid #ddd;padding:8px 12px}</style></head><body><h2>嘉文四世</h2>\n" + "<blockquote>\n" + "<p>德瑪西亞</p>\n" + "</blockquote>\n" + "<p><strong>給我找些更強的敵人!</strong></p>\n" + "<table>\n" + "<thead>\n" + "<tr><th>列1</th><th>列2</th></tr>\n" + "</thead>\n" + "<tbody>\n" + "<tr><td>數據1</td><td>數據2</td></tr>\n" + "</tbody>\n" + "</table>\n" + "</body></html>";
    convertHtmlToDocx(html, "demo.docx");
}

測試結果如下:

image-20251104101836103

此時其實如果想要輸出的內容樣式好看, 通過定義css基本可以滿足了, 但如果是制式文檔對行間距,字間距,字體型號,標題,等有嚴格的要求, 如果這些都通過css定義的話 有點麻煩, 畢竟人家制式的文檔本身已經定義好了, 那麼就可以使用下面的方式

5.2 Html映射WordStyleId

我們可以先看一下word的內置樣式, 我這裏使用的是mac office,windows 和 wps 有些許差異

image-20251104102955668

從上圖中可以看到, word其實是有很多內置樣式的, 並且可以新建樣式, 下面也可以篩選列表

我們經常在快捷樣式列表中選擇的樣式其實就是從這裏來的

image-20251104103219913

我們先手動新增一個自定義的樣式 如下:

image-20251104110959200

然後通過docx4j獲取所有的wordstyle列表 如下:

private static WordprocessingMLPackage wordMLPackage;

@BeforeAll
@SneakyThrows
public static void init_mainDocumentPart() {
  	File templateFile = new File("demo.docx");
  	wordMLPackage = WordprocessingMLPackage.load(templateFile);
}

@Test
@SneakyThrows
public void given_doc_template_when_extract_style_then_return_style_list() {
  	final StyleDefinitionsPart sdp = wordMLPackage.getMainDocumentPart().getStyleDefinitionsPart();
  	List<Style> styles = sdp.getContents().getStyle();
  	log.info("docx styles length: {}", styles.size());
  	for (Style style : styles) {
  	  String styleId = style.getStyleId();
  	  String name = style.getName().getVal();
  	  final String type = style.getType();
  	  log.info("styleId: {}, name: {}, type: {}", styleId, name, type);
  	}
}

測試結果如下: 除了內置的樣式如一級標題id=1, 最後的兩個自定義樣式是我們新加的

為什麼手動添加了一個, 而出現兩個樣式記錄: 可能是在選在樣式類型的時候選擇的是“鏈接段落和字符”導致出現了一對多的情況

[main] INFO html2docx.DocxStyleTest -- docx styles length: 25
[main] INFO html2docx.DocxStyleTest -- styleId: a, name: Normal, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: 1, name: heading 1, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: 2, name: heading 2, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: 3, name: heading 3, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: 4, name: heading 4, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: a0, name: Default Paragraph Font, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: a1, name: Normal Table, type: table
[main] INFO html2docx.DocxStyleTest -- styleId: a2, name: No List, type: numbering
[main] INFO html2docx.DocxStyleTest -- styleId: a3, name: header, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: a4, name: 頁眉 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: 10, name: 標題 1 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: 20, name: 標題 2 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: 30, name: 標題 3 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: 40, name: 標題 4 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: a5, name: Normal Indent, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: a6, name: Subtitle, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: a7, name: 副標題 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: a8, name: Title, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: a9, name: 標題 字符, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: aa, name: Emphasis, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: ab, name: Hyperlink, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: ac, name: Table Grid, type: table
[main] INFO html2docx.DocxStyleTest -- styleId: ad, name: caption, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: customBodyText, name: customBodyText, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: customBodyText0, name: customBodyText 字符, type: character

測試的時候發現一個奇怪的問題, 如果沒有手動添加樣式的話輸出的內容如下:

[main] INFO html2docx.DocxStyleTest -- docx styles length: 22
[main] INFO html2docx.DocxStyleTest -- styleId: Normal, name: Normal, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: Heading1, name: heading 1, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: Heading2, name: heading 2, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: Heading3, name: heading 3, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: Heading4, name: heading 4, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: DefaultParagraphFont, name: Default Paragraph Font, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Header, name: header, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: HeaderChar, name: Header Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Heading1Char, name: Heading 1 Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Heading2Char, name: Heading 2 Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Heading3Char, name: Heading 3 Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Heading4Char, name: Heading 4 Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: NormalIndent, name: Normal Indent, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: Subtitle, name: Subtitle, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: SubtitleChar, name: Subtitle Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Title, name: Title, type: paragraph
[main] INFO html2docx.DocxStyleTest -- styleId: TitleChar, name: Title Char, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Emphasis, name: Emphasis, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: Hyperlink, name: Hyperlink, type: character
[main] INFO html2docx.DocxStyleTest -- styleId: TableGrid, name: Table Grid, type: table
[main] INFO html2docx.DocxStyleTest -- styleId: TableNormal, name: Normal Table, type: table
[main] INFO html2docx.DocxStyleTest -- styleId: Caption, name: caption, type: paragraph

可以發現內置的style前後不一致,未添加自定義樣式前一級標題的id為“Heading1”, 修改後就成了“1”, 可能是默認生成的文檔還是英文的, 當修改保存了之後, 就被系統篡改成中文的了

ok, word樣式我們定義好了

現在就通過docx4j應用一下自定義的word樣式, 實現思路: 通過html標籤的class屬性 映射wrod的styleId

首先給html加上class信息如下圖:

image-20251104140415430

private static WordprocessingMLPackage wordMLPackage;

@BeforeAll
@SneakyThrows
public static void init_mainDocumentPart() {
  	File templateFile = new File("demo.docx");
  	wordMLPackage = WordprocessingMLPackage.load(templateFile);
}

@Test
@SneakyThrows
public void given_doc_template_and_class_when_mapping_custom_style_then_render_doc() {
  	final String html = "<html><head><style>table{border-collapse:collapse;border-spacing:0;width:100%;margin:1em 0;background-color:transparent}table th{background-color:#f7f7f7;border:1px solid#ddd;padding:8px 12px;text-align:left}table td{border:1px solid#ddd;padding:8px 12px}</style></head><body><h2 class=\"1\">嘉文四世</h2><blockquote><p class=\"customBodyText\">德瑪西亞</p></blockquote><p class=\"customBodyText\"><strong>給我找些更強的敵人!</strong></p><table><thead><tr><th>列1</th><th>列2</th></tr></thead><tbody><tr><td>數據1</td><td>數據2</td></tr></tbody></table></body></html>";
  	final MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
  	XHTMLImporterImpl importer = new XHTMLImporterImpl(wordMLPackage);
  	// CLASS_TO_STYLE_ONLY:只認 class,不管 style 和 <strong>/<em> 等標籤,相當於「純 CSS 類驅動樣式」
  	// CLASS_PLUS_OTHER:class 是基礎樣式,style 和內聯標籤是補充 / 覆蓋,相當於「類樣式 + 局部微調樣式」
  	// IGNORE_CLASS: 忽略class樣式
  	importer.setParagraphFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
  	importer.setRunFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
  	importer.setTableFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
  	// html轉ooxml
  	final List<Object> docxContent = importer.convert(html, null);
  	final List<Object> docxOldContent = mainDocumentPart.getContent();
  	// 清空模板內容 並 添加新的內容
  	docxOldContent.clear();
  	docxOldContent.addAll(docxContent);
  	Docx4J.save(wordMLPackage, new File("newDemo.docx"), Docx4J.FLAG_NONE);
}

測試結果如下:

一級標題和自定義樣式都映射完成了

image-20251104141502094

6. 佔位符替換

結合上述的功能, 已經能很好的輸出html到word中了, 但項目上又有新的需求了, 不光是將markdown→html→word, 還需要將大模型識別到的內容一同輸出到模板文件中去, 也就是最終輸出的內容有兩部分

  1. 大模型提取到的人員信息, 如姓名工作住址等
  2. 大模型總結的人員描述信息(markdown)

其實第二步內容使用doc4j已經實現了, 現在需要通過佔位符的方式輸出人員基本信息到word中, 這個其實很好處理, 可以使用poi-tl實現佔位符的替換

6.1 Maven

<dependency>
  	<groupId>com.deepoove</groupId>
  	<artifactId>poi-tl</artifactId>
  	<version>1.12.0</version>
</dependency>

6.2 實現

@Test
public void given_template_doc_and_content_when_replace_then_replace() {
    final Configure templateEngineConfigure = Configure.builder().build();
    File templateFile = new File("demo.docx");
    File outputFile = new File("newDemo.docx");
    Map<String, Object> data = new HashMap<>();
    data.put("user", "嘉文四世");
	data.put("summoner", "張鐵牛");
    data.put("position", "打野");
    data.put("dialogue", "給我找些更強的敵人");
    try (XWPFTemplate template = XWPFTemplate.compile(templateFile, templateEngineConfigure)) {
            template.render(data).writeToFile(outputFile.getAbsolutePath());
        }
        catch (IOException e) {
            log.error("failed to replace template word placeholder", e);
            throw new RuntimeException(e);
        }
}

模板內容如下:

image-20251104153401522

測試結果如下:

不僅實現了佔位符的替換, 而且也保留了佔位符本身的樣式, 這就很舒服了

image-20251104153503042

7. 封裝工具類

為了更方便的調用docx4j和poi-tl操作word, 我們可以封裝一個工具類去更方便的調用, 比如可以通過傳入一個map對象然後實現自動替換佔位符和markdown內容渲染, 最好是通過鏈式調用一行代碼就解決戰鬥, 沒錯 它來了

import com.deepoove.poi.XWPFTemplate;
import com.deepoove.poi.config.Configure;
import lombok.SneakyThrows;
import lombok.extern.slf4j.Slf4j;
import org.docx4j.convert.in.xhtml.FormattingOption;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.Body;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.function.BiFunction;

/**
 * doc操作工具類
 *
 * @author ludangxin
 * @since 2025/10/14
 */
@Slf4j
public class Docs {
    @SneakyThrows
    public static DocBuilder builder() {
        return new DocBuilder().wordMLPackage(WordprocessingMLPackage.createPackage());
    }

    @SneakyThrows
    public static DocBuilder builder(File file) {
        return new DocBuilder().templateInputStream(Files.newInputStream(file.toPath()))
                               .wordMLPackage(WordprocessingMLPackage.load(file));
    }

    @SneakyThrows
    public static DocBuilder builder(InputStream inputStream) {
        return new DocBuilder().templateInputStream(inputStream)
                               .wordMLPackage(WordprocessingMLPackage.load(inputStream));
    }

    @SneakyThrows
    public static DocBuilder builder(String filePath) {
        return new DocBuilder().templateInputStream(Files.newInputStream(new File(filePath).toPath()))
                               .wordMLPackage(WordprocessingMLPackage.load(new File(filePath)));
    }

    public static class DocBuilder {
        private InputStream templateInputStream;

        private WordprocessingMLPackage wordMLPackage;

        private XHTMLImporterImpl importer;

        private FormattingOption paragraphFormatting;

        private FormattingOption runFormatting;

        private FormattingOption tableFormatting;

        private String staticResourceBaseUri;

        private String[] placeHolderPreSuffix = new String[]{"{{", "}}"};

        private Configure templateEngineConfigure;

        private boolean useHtmlDefaultStyle = true;

        private boolean autoCloseStream = true;

        private String globalCss = "table{border-collapse:collapse;border-spacing:0;width:100%;margin:1em 0;background-color:transparent;}table th{background-color:#f7f7f7;border:1px solid #ddd;padding:8px 12px;text-align:left}table td{border:1px solid #ddd;padding:8px 12px}";

        /**
         * <String, String, String>: htmlContent htmlKey resultHtmlContent
         */
        private BiFunction<String, String, String> htmlContentProcessor;

        private DocBuilder templateInputStream(InputStream templateInputStream) {
            this.templateInputStream = templateInputStream;
            return this;
        }

        private DocBuilder wordMLPackage(WordprocessingMLPackage wordMLPackage) {
            this.wordMLPackage = wordMLPackage;
            return this;
        }

        public DocBuilder importer(XHTMLImporterImpl importer) {
            this.importer = importer;
            return this;
        }

        public DocBuilder paragraphFormatting(FormattingOption paragraphFormatting) {
            this.paragraphFormatting = paragraphFormatting;
            return this;
        }

        public DocBuilder runFormatting(FormattingOption runFormatting) {
            this.runFormatting = runFormatting;
            return this;
        }

        public DocBuilder tableFormatting(FormattingOption tableFormatting) {
            this.tableFormatting = tableFormatting;
            return this;
        }

        public DocBuilder useHtmlDefaultStyle(boolean useHtmlDefaultStyle) {
            this.useHtmlDefaultStyle = useHtmlDefaultStyle;
            return this;
        }

        public DocBuilder staticResourceBaseUri(String staticResourceBaseUri) {
            this.staticResourceBaseUri = staticResourceBaseUri;
            return this;
        }

        public DocBuilder placeHolderPreSuffix(String placeHolderPrefix, String placeHolderSuffix) {
            this.placeHolderPreSuffix = new String[]{placeHolderPrefix, placeHolderSuffix};
            return this;
        }

        public DocBuilder templateEngineConfigure(Configure templateEngineConfigure) {
            this.templateEngineConfigure = templateEngineConfigure;
            return this;
        }

        public DocBuilder autoCloseStream(boolean autoCloseStream) {
            this.autoCloseStream = autoCloseStream;
            return this;
        }

        public DocBuilder globalCss(String globalCss) {
            this.globalCss = globalCss;
            return this;
        }

        public DocBuilder htmlContentProcessor(BiFunction<String, String, String> htmlContentProcessor) {
            this.htmlContentProcessor = htmlContentProcessor;
            return this;
        }

        public List<Object> buildWordML(String html) {
            return this.buildWordML(html, null);
        }

        public void buildWord(String html, String outputFile) {
            this.buildWord(html, new File(outputFile));
        }

        public void buildWord(String html, File outputFile) {
            try {
                this.getMainContent()
                    .addAll(this.buildWordML(html));
                wordMLPackage.save(outputFile);
            }
            catch (Exception e) {
                log.error("failed to build word file", e);
                throw new RuntimeException(e);
            }
        }

        public void buildWord(String html, OutputStream outputStream) {
            try {
                this.getMainContent()
                    .addAll(this.buildWordML(html));
                wordMLPackage.save(outputStream);
            }
            catch (Exception e) {
                log.error("failed to build word file", e);
                throw new RuntimeException(e);
            }
            finally {
                try {
                    if (autoCloseStream) {
                        outputStream.close();
                    }
                }
                catch (IOException ignored) {
                }
            }
        }

        public void buildWord(Map<String, Object> placeHolderData, OutputStream outputStream) {
            try {
                // 替換模板中的普通佔位符
                if (this.checkPlaceHolderDataType(placeHolderData) == 1) {
                    this.replacePlaceHolder(placeHolderData, outputStream);
                }

                // 替換模板中包含的html
                if (this.checkPlaceHolderDataType(placeHolderData) == 2) {
                    this.replaceHtmlPlaceHolder(placeHolderData, outputStream);
                }

                // 替換普通/html佔位符
                if (this.checkPlaceHolderDataType(placeHolderData) == 3) {
                    final File tempDocFile = DocUtils.createTempDocFile();
                    this.replaceHtmlPlaceHolder(placeHolderData, tempDocFile);
                    this.replacePlaceHolder(placeHolderData, tempDocFile, tempDocFile);
                    DocUtils.writeAndDeleteFile(tempDocFile, outputStream);
                }
            }
            catch (Exception e) {
                log.error("failed to build word file", e);
                throw new RuntimeException(e);
            }
            finally {
                try {
                    if (autoCloseStream) {
                        outputStream.close();
                    }
                }
                catch (IOException ignored) {
                }
            }
        }

        public void buildWord(Map<String, Object> placeHolderData, File outputFile) {
            // 替換模板中的普通佔位符
            if (this.checkPlaceHolderDataType(placeHolderData) == 1) {
                this.replacePlaceHolder(placeHolderData, outputFile);
            }

            // 替換模板中包含的html
            if (this.checkPlaceHolderDataType(placeHolderData) > 1) {
                this.replaceHtmlPlaceHolder(placeHolderData, outputFile);
            }

            // 追加替換普通佔位符
            if (this.checkPlaceHolderDataType(placeHolderData) == 3) {
                this.replacePlaceHolder(placeHolderData, outputFile, outputFile);
            }
        }

        private List<Object> buildWordML(String html, String htmlKey) {
            final XHTMLImporterImpl importer = this.getImporterOrDefault();
            try {
                if (globalCss != null && !globalCss.isEmpty()) {
                    html = DocUtils.addHtmlStyles(html, globalCss);
                }

                if (htmlContentProcessor != null) {
                    html = htmlContentProcessor.apply(html, htmlKey);
                }

                return importer.convert(html, staticResourceBaseUri);
            }
            catch (Exception e) {
                log.error("failed to convert HTML to XHTML", e);
                throw new RuntimeException(e);
            }
        }

        private void replaceHtmlPlaceHolder(Map<String, Object> placeHolderData, File outputFile) {
            this.doReplaceHtmlPlaceHolder(placeHolderData);

            try {
                // 替換html
                wordMLPackage.save(outputFile);
            }
            catch (Docx4JException e) {
                log.error("failed to build word file", e);
                throw new RuntimeException(e);
            }
        }

        private void replaceHtmlPlaceHolder(Map<String, Object> placeHolderData, OutputStream outputStream) {
            this.doReplaceHtmlPlaceHolder(placeHolderData);

            try {
                // 替換html
                wordMLPackage.save(outputStream);
            }
            catch (Docx4JException e) {
                log.error("failed to build word file", e);
                throw new RuntimeException(e);
            }
            finally {
                try {
                    if (autoCloseStream) {
                        outputStream.close();
                    }
                }
                catch (IOException ignored) {
                }
            }
        }

        private void doReplaceHtmlPlaceHolder(Map<String, Object> placeHolderData) {
            final List<Object> mainContent = this.getMainContent();
            List<Object> newContent = new ArrayList<>();

            for (Object p : mainContent) {
                String text = DocUtils.extractText(p);

                Optional<String> matchedKey = placeHolderData.keySet()
                                                             .stream()
                                                             .filter(key -> DocUtils.matchPlaceHolder(text, key, placeHolderPreSuffix[0], placeHolderPreSuffix[1]))
                                                             .findFirst();

                if (matchedKey.isPresent()) {
                    String key = matchedKey.get();
                    Object value = placeHolderData.get(key);

                    if (DocUtils.isHtml(value)) {
                        final List<Object> wordFragment = this.buildWordML((String) value, key);
                        newContent.addAll(wordFragment);
                    }
                    else {
                        newContent.add(p);
                    }
                }
                else {
                    newContent.add(p);
                }
            }

            // 替換模板內容
            mainContent.clear();
            mainContent.addAll(newContent);
        }

        private void replacePlaceHolder(Map<String, Object> data, File templateFile, File outputFile) {
            final Configure templateEngineConfigure = this.getTemplateEngineConfigureOrDefault();
            try (XWPFTemplate template = XWPFTemplate.compile(templateFile, templateEngineConfigure)){
                template.render(data)
                        .writeToFile(outputFile.getAbsolutePath());
            }
            catch (IOException e) {
                log.error("failed to replace template word placeholder", e);
                throw new RuntimeException(e);
            }
        }

        public void replacePlaceHolder(Map<String, Object> data, File outputFile) {
            final Configure templateEngineConfigure = this.getTemplateEngineConfigureOrDefault();

            if (templateInputStream == null) {
                throw new NullPointerException("template file can not be null");
            }

            XWPFTemplate template = XWPFTemplate.compile(templateInputStream, templateEngineConfigure);
            try {
                template.render(data)
                        .writeToFile(outputFile.getAbsolutePath());
            }
            catch (IOException e) {
                log.error("failed to replace template word placeholder", e);
                throw new RuntimeException(e);
            }
        }

        public void replacePlaceHolder(Map<String, Object> data, String outputFileAbsolutePath) {
            final Configure templateEngineConfigure = this.getTemplateEngineConfigureOrDefault();

            if (templateInputStream == null) {
                throw new NullPointerException("template file can not be null");
            }

            XWPFTemplate template = XWPFTemplate.compile(templateInputStream, templateEngineConfigure);
            try {
                template.render(data)
                        .writeToFile(outputFileAbsolutePath);
            }
            catch (IOException e) {
                log.error("failed to replace template word placeholder", e);
                throw new RuntimeException(e);
            }
        }

        public void replacePlaceHolder(Map<String, Object> data, OutputStream outputStream) {
            final Configure templateEngineConfigure = this.getTemplateEngineConfigureOrDefault();

            if (templateInputStream == null) {
                throw new NullPointerException("template file can not be null");
            }

            try {
                XWPFTemplate template = XWPFTemplate.compile(templateInputStream, templateEngineConfigure);
                final XWPFTemplate render = template.render(data);
                render.write(outputStream);
            }
            catch (IOException e) {
                log.error("failed to replace template word placeholder", e);
                throw new RuntimeException(e);
            }
            finally {
                try {
                    if (autoCloseStream) {
                        outputStream.close();
                    }
                }
                catch (IOException ignored) {
                }
            }
        }

        private XHTMLImporterImpl getImporterOrDefault() {
            if (importer == null) {
                if (paragraphFormatting != null || runFormatting != null || tableFormatting != null) {
                    XHTMLImporterImpl importer = new XHTMLImporterImpl(wordMLPackage);
                    importer.setParagraphFormatting(paragraphFormatting == null ? FormattingOption.CLASS_PLUS_OTHER : paragraphFormatting);
                    importer.setRunFormatting(runFormatting == null ? FormattingOption.CLASS_PLUS_OTHER : runFormatting);
                    importer.setTableFormatting(tableFormatting == null ? FormattingOption.CLASS_PLUS_OTHER : tableFormatting);

                    return importer;
                }
                else {
                    return this.defaultImporter();
                }
            }
            else {
                return this.importer;
            }
        }

        private Configure getTemplateEngineConfigureOrDefault() {
            if (templateEngineConfigure == null) {
                return this.defaultTemplateEngineConfigure();
            }
            else {
                return this.templateEngineConfigure;
            }
        }

        private Configure defaultTemplateEngineConfigure() {
            return Configure.builder()
                            .buildGramer(placeHolderPreSuffix[0], placeHolderPreSuffix[1])
                            .build();
        }

        private XHTMLImporterImpl defaultImporter() {
            XHTMLImporterImpl importer = new XHTMLImporterImpl(wordMLPackage);
            if (useHtmlDefaultStyle) {
                importer.setParagraphFormatting(FormattingOption.CLASS_PLUS_OTHER);
                importer.setRunFormatting(FormattingOption.CLASS_PLUS_OTHER);
                importer.setTableFormatting(FormattingOption.CLASS_PLUS_OTHER);
            }
            else {
                importer.setParagraphFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
                importer.setRunFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
                importer.setTableFormatting(FormattingOption.CLASS_TO_STYLE_ONLY);
            }

            return importer;
        }

        private List<Object> getMainContent() {
            MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
            if (globalCss != null && !globalCss.isEmpty()) {
                mainDocumentPart.getStyleDefinitionsPart()
                                .setCss(globalCss);
            }
            Body body = mainDocumentPart.getJaxbElement()
                                        .getBody();
            return body.getContent();
        }

        /**
         * 判斷佔位符數據類型
         *
         * @param placeHolderData 佔位符數據
         * @return 1: 數據不包含html 2: 數據全是html 3: 都包含
         */
        private int checkPlaceHolderDataType(Map<String, Object> placeHolderData) {
            boolean hasHtmlValFlag = false;
            boolean hasCommonValFlag = false;

            for (Object value : placeHolderData.values()) {
                if (DocUtils.isHtml(value)) {
                    hasHtmlValFlag = true;
                }
                else {
                    hasCommonValFlag = true;
                }
            }

            if (!hasHtmlValFlag && hasCommonValFlag) {
                return 1;
            }

            if (hasHtmlValFlag && !hasCommonValFlag) {
                return 2;
            }

            return 3;
        }
    }
}

8. 測試示例

import lombok.SneakyThrows;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

import java.io.File;
import java.io.OutputStream;
import java.nio.file.Files;
import java.util.HashMap;
import java.util.Map;

/**
 * docs test
 *
 * @author ludangxin
 * @since 2025/11/4
 */
public class DocxTest {
    private static final File TEMPLATE_FILE = new File("demo.docx");
    private static final File OUTPUT_FILE = new File("output.docx");
    private static final Map<String, Object> DATA = new HashMap<>();

    @BeforeAll
    public static void given_data() {
        DATA.put("user", "嘉文四世");
        DATA.put("summoner", "張鐵牛");
        DATA.put("position", "打野");
        DATA.put("dialogue", "給我找些更強的敵人");
        final String markdownContent = "- **背景故事**:嘉文四世是德瑪西亞國王嘉文三世的獨生子,其母凱瑟琳女士因難產而死。嘉文在宮廷中長大,接受了良好的德瑪西亞式教育,並結識了趙信,向其學習戰爭藝術。他與蓋倫年齡相仿,結為好兄弟。嘉文曾率軍前往邊境對抗諾克薩斯,卻因戰力分散而戰敗,幸得希瓦娜相救。後來,德瑪西亞國內搜魔人兵團搜捕魔法師引發起義,嘉文三世慘遭弒殺,嘉文四世接掌了議會,之後他登基成為德瑪西亞國王。\n" + "\n" + "- **角色定位**:在遊戲中,嘉文四世的定位是坦克、戰士,他常常需要帶頭衝入敵方陣地,因此相比輸出更加需要增強防禦能力。\n" + "\n" + "- 技能介紹\n" + "  :\n" + "  - **被動技能 - 戰爭律動**:普攻命中時,會對目標造成 8% 當前生命值的額外物理傷害,該效果作用於同一目標的冷卻時間為 6 秒。\n" + "  - **一技能 - 巨龍撞擊**:用長矛穿透路徑上的敵人,對其造成物理傷害,並減少其護甲,持續 3 秒。若長矛觸及 “德邦軍旗”,嘉文四世會被引向軍旗,並擊飛沿途敵人 0.75 秒。\n" + "  - **二技能 - 黃金聖盾**:釋放出一道帝王光環,使周圍敵人減速,持續 2 秒,同時提供一個可以吸收傷害的護盾,持續 5 秒,附近每多一名敵方英雄,吸收傷害增加。\n" + "  - **三技能 - 德邦軍旗**:投擲一柄軍旗,對敵人造成魔法傷害,並將軍旗置於原地 8 秒,使附近隊友獲得攻擊速度加成。在 “德邦軍旗” 附近再次點擊施放該技能,將會朝軍旗施放 “巨龍撞擊”。\n" + "  - **終極技能 - 天崩地裂**:躍向敵方英雄,對目標及其附近的敵人造成物理傷害,並在目標周圍形成環形障礙,持續 3.5 秒,再次點擊施放可使障礙倒塌。\n" + "\n" + "- **皮膚信息**:嘉文四世擁有多款皮膚,包括孤膽英豪、暗星、福牛守護者等。";
        // markdown 2 html (上一章博客的內容)
        final String htmlContent = Markdowns.builder(markdownContent)
                                            .buildHtmlContent();
        DATA.put("description", htmlContent);
    }

    @Test
    public void given_template_doc_and_content_when_replace_then_complete() {
        Docs.builder(TEMPLATE_FILE).buildWord(DATA, OUTPUT_FILE);
    }

    @Test
    @SneakyThrows
    public void given_template_doc_and_content_when_replace_and_output_stream_then_complete() {
        final OutputStream fileOutputStream = Files.newOutputStream(OUTPUT_FILE.toPath());
        // 接收輸出流
        Docs.builder(TEMPLATE_FILE).autoCloseStream(true).buildWord(DATA, fileOutputStream);
    }
}

模板內容如下:

image-20251104151925516

測試結果如下:

image-20251104152052595

9. 小結

本章使用docx4jpoi-tl實現將普通佔位符內容和html文本內容轉換為word, 並介紹瞭如何使用其特性實現自定義樣式渲染, 最後封裝鏈式調用的工具類和對應的單元測試代碼, 結合上一章內容能夠將各種形式的內容通過一行代碼即可實現word的渲染

10. 源碼

測試過程中的代碼已全部上傳至github, 歡迎點贊收藏 倉庫地址: https://github.com/ludangxin/markdown2docx

Add a new Comments

Some HTML is okay.