动态

详情 返回 返回

技術阿里雲實現ocr批量圖片和pdf文件表格圖片轉換excel文檔/支持票據圖片提取/普通圖片文字提取處理 - 动态 详情

支持pdf/圖片/表格等格式文件裝換成excel文件或其他格式文件

首先,圖片識別過程

 @Test
    void request_002() throws FileNotFoundException {
        //讀取文件夾
        String fileSource = "C:\\Users\\Administrator\\Desktop\\work\\20221217\\invoice\\pageFiles";
        String fileName = fileSource + "\\excelFile\\" + "票據_" + DateUtil.format(DateUtil.date(), DatePattern.PURE_DATETIME_PATTERN) + ".xlsx";
        long beginTime = System.currentTimeMillis();
        List<File> files = FileUtil.loopFiles(fileSource);
        List<InvoiceVO> getList = new ArrayList<>();
        for (File file : files) {
            Console.log("開始識別文件 : {}", file.getName());
            //請求參數
            RecognizeInvoiceRequest request = new RecognizeInvoiceRequest();
            RuntimeOptions runtime = new RuntimeOptions();
            request.body = new FileInputStream(file.getPath());
            try {
                RecognizeInvoiceResponse response = client().recognizeInvoiceWithOptions(request, runtime);
                Console.log("文件 :{} 識別成功", file.getName());
                JSONObject jsonObject = JSONObject.parseObject(response.body.data);
                String data = jsonObject.getString("data");
                Console.log("data : => {}", data);
                InvoiceVO invoiceData = JSONUtil.toBean(data, InvoiceVO.class);
                getList.add(invoiceData);
            } catch (TeaException error) {
                Console.log(error.message);
            } catch (Exception _error) {
                TeaException error = new TeaException(_error.getMessage(), _error);
                Console.log(error.message);
            }
        }
        //執行寫出
        if (getList.size() > 0) {
            Console.log("開始寫出excel文件~");
            toExcel(getList, fileName);
            Console.log("文件 : {}  寫出成功! 總耗時 : {} 秒", fileName, (System.currentTimeMillis() - beginTime) / 1000);
        }
    }

接着,寫出excel文件

private void toExcel(List<InvoiceVO> getList, String filePathName) {
        //合併單元格 (開始列,結束列)
        TreeMap<Integer, Integer> treeMap = new TreeMap<>();
        int beforeRow = 1;

        //不進行合併的列
        List<Integer> unMergeList = new ArrayList<>();

        //寫出的文件列表
        List<InvoiceVO> dataList = new ArrayList<>();
        for (int i = 0; i < getList.size(); i++) {
            InvoiceVO invoiceVO = getList.get(i);
            List<InvoiceDetails> details = invoiceVO.getInvoiceDetails();
            for (InvoiceDetails detail : details) {
                InvoiceVO vo = new InvoiceVO();
                BeanUtil.copyProperties(invoiceVO, vo);
                BeanUtil.copyProperties(detail, vo);
                dataList.add(vo);
            }
            //防止越過合併處理
            int detailSize = details.size();
            int afterRowSize = beforeRow + detailSize;
            treeMap.put(beforeRow, afterRowSize - 1);
            if(detailSize <= 1){
                unMergeList.add(beforeRow);
            }
            beforeRow = afterRowSize;
        }

        //寫出文件
        ExcelWriter writer = ExcelUtil.getWriter(filePathName);
        //標題
        addHeader(writer);
        //自動列寬
        writer.autoSizeColumnAll();
        treeMap.forEach((k, v) -> {
            //一行詳情就不進行合併
            if(!unMergeList.contains(k)){
                for (int i = 0; i < 22; i++) {
                    //merge : 開始的列號,結束的列號,開始的行號,結束的行號,合併後的數據(自動填充輸出數據的列表),是否保留原樣式
                    writer.merge(k, v, i, i, "合併數據", false);
                }
            }
        });
        writer.setOnlyAlias(true);
        writer.write(dataList,true);
        writer.close();
    }

最後,效果圖

提取效果圖

更多具體功能實現微/電:16717696360

業務範圍

Add a new 评论

Some HTML is okay.