從圖像中提取結構化數據（使用 Spring AI）

1. 概述

本教程將介紹如何使用 OpenAI 的 chat 模型和 Spring AI 從圖像中提取結構化數據。

OpenAI 的 chat 模型可以分析上傳的圖像並返回相關信息。它還可以返回結構化輸出，方便將其流水線式地傳遞到其他應用程序進行進一步操作。

為了説明，我們將創建一個 Web 服務，該服務接受客户端上傳的圖像，並將其發送到 OpenAI 以計算圖像中彩色汽車的數量。該 Web 服務以 JSON 格式返回顏色計數。

2. Spring Boot 配置

我們需要將以下依賴項添加到我們的 Maven pom.xml 中：Spring Boot Web Starter 和 Spring AI OpenAI 模型。

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>3.4.1</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

在我們的 Spring Boot 的 application.yml 文件中，我們需要提供 OpenAI API 的認證密鑰 (spring.ai.openai.api-key) 以及能夠執行圖像分析的聊天模型 (spring.ai.openai.chat.options.model)。

有多種支持圖像分析的模型，例如 gpt-4o-mini, gpt-4o 和 gpt-4.5-preview。較大的模型，如 gpt-4o，具有更廣泛的知識，但成本也更高，而較小的模型，如 gpt-4o-mini，則成本較低且延遲更小。我們可以根據我們的需求選擇模型。

讓我們在我們的演示中選擇 gpt-4o 聊天模型：

spring:
  ai:
    openai:
      api-key: "<YOUR-API-KEY>"
      chat:
        options:
          model: "gpt-4o"

在獲取這些配置後，Spring Boot 會自動加載 OpenAiAutoConfiguration 以註冊諸如 ChatClient 這樣的 Bean，我們將會在應用程序啓動期間稍後創建這些 Bean。

3. 示例 Web 服務

完成所有配置後，我們將創建一個 Web 服務，允許用户上傳他們的圖像，並將它們傳遞給 OpenAI 以進行圖像中彩色汽車數量的統計，作為下一步操作。

3.1. REST 控制器

在 REST 控制器中，我們僅接受圖像文件和將在圖像中計數的顏色作為請求參數：

@RestController
@RequestMapping("/image")
public class ImageController {
    @Autowired
    private CarCountService carCountService;

    @PostMapping("/car-count")
    public ResponseEntity<?> getCarCounts(@RequestParam("colors") String colors,
      @RequestParam("file") MultipartFile file) {
        try (InputStream inputStream = file.getInputStream()) {
            var carCount = carCountService.getCarCount(inputStream, file.getContentType(), colors);
            return ResponseEntity.ok(carCount);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("Error uploading image");
        }
    }
}

為了獲得成功的響應，我們期望服務返回一個 ResponseEntity，其類型為 CarCount。

3.2. POJO

如果希望聊天模型返回結構化輸出，我們通過在 HTTP 請求中定義輸出格式為 JSON 模式來告知 OpenAI。 在 Spring AI 中，這通過定義 POJO 類得以極大簡化。

讓我們定義兩個 POJO 類，用於存儲顏色及其對應的計數。 CarCount 存儲每種顏色及其計數的列表，以及總計，即列表中計數的總和：

public class CarCount {
    private List<CarColorCount> carColorCounts;
    private int totalCount;

    // constructor, getters and setters
}

CarColorCount 存儲了顏色名稱和對應的計數：

public class CarColorCount {
    private String color;
    private int count;

    // constructor, getters and setters
}

3.3. 服務

現在，讓我們創建一個核心的 Spring 服務，該服務將圖像發送到 OpenAI 的 API 進行分析。在這個 CarCountService 中，我們注入一個 ChatClientBuilder，該 Builder 用於創建與 OpenAI 進行通信的 ChatClient：

@Service
public class CarCountService {
    private final ChatClient chatClient;

    public CarCountService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    public CarCount getCarCount(InputStream imageInputStream, String contentType, String colors) {
        return chatClient.prompt()
          .system(systemMessage -> systemMessage
            .text("Count the number of cars in different colors from the image")
            .text("User will provide the image and specify which colors to count in the user prompt")
            .text("Count colors that are specified in the user prompt only")
            .text("Ignore anything in the user prompt that is not a color")
            .text("If there is no color specified in the user prompt, simply returns zero in the total count")
          )
          .user(userMessage -> userMessage
            .text(colors)
            .media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageInputStream))
          )
          .call()
          .entity(CarCount.class);
    }
}

本服務將系統提示和用户提示提交給 OpenAI。

系統提示提供聊天模型行為的指導方針。 這包含一套避免意外行為的指令，例如避免用户未指定的情況下進行顏色計數。從而確保聊天模型返回更可預測的響應。

用户提示為聊天模型提供必要的處理數據。 在我們的示例中，我們將其傳遞了兩個輸入：第一個是作為文本輸入提供的顏色，第二個是作為媒體輸入上傳的圖像。這需要同時提供上傳的文件 InputStream 以及我們能夠從文件內容類型中推斷出的媒體 MIME 類型。

需要注意的是，我們必須在 entity() 中提供我們先前創建的 POJO 類。 這會觸發 Spring AI 的 BeanOutputConverter 將 OpenAI JSON 響應轉換為我們的 CarCount POJO 實例。

4. 測試運行

現在，一切都已準備就緒。我們現在可以進行測試運行，以查看其行為。我們使用 Postman 向此 Web 服務發送請求。我們在此為聊天模型指定了三種不同的顏色（藍色、黃色和綠色）以進行圖像計數：

在我們的示例中，我們將使用以下照片進行測試：

根據請求，我們將從 Web 服務處收到 JSON 響應：

{
    "carColorCounts": [
        {
            "color": "blue",
            "count": 2
        },
        {
            "color": "yellow",
            "count": 1
        },
        {
            "color": "green",
            "count": 0
        }
    ],
    "totalCount": 3
}

響應顯示了我們請求中指定每種顏色的汽車數量。此外，還提供了所提及顏色汽車的總數。JSON 模式與我們在 CarCount 和 CarColorCount 中的 POJO 類定義相符。

5. 結論

本文介紹瞭如何從 OpenAI Chat 模型中提取結構化輸出。我們還構建了一個 Web 服務，該服務接受上傳的圖像，將其傳遞給 OpenAI Chat 模型進行圖像分析，並返回包含相關信息的結構化輸出。

知識庫 / Spring / Spring AI RSS 訂閱