從特定Java類生成Avro Schema

Data,Jackson
Remote
1
06:27 PM · Nov 30 ,2025

1. 簡介

在本教程中,我們將討論從現有 Java 類生成 Avro 模式的不同選項。雖然這不是標準工作流程,但這種轉換方向也可能發生,並且以最簡單的方式,藉助現有的庫進行了解釋,具有良好的可讀性。

2. 什麼是 Avro?

在深入探討如何將現有類轉換為模式之前,我們先回顧一下什麼是 Avro。

根據文檔所述,它是一個能夠進行數據序列化和反序列化的數據序列化系統,遵循預定義的模式,這是該系統的核心。 模式本身以 JSON 格式表達。 更多關於 Avro 的信息可以在已發佈的指南中找到。

3. 從現有Java類生成Avro Schema 的動機

使用 Avro 的標準工作流程包括定義 schema,然後生成所選語言中的類。 即使這種方式是最流行的,也可以反向生成 Avro schema,從項目中現有的類中生成。

設想一個場景:我們正在與遺留系統一起工作,並希望通過消息代理髮出數據,我們決定使用 Avro 作為 (解)序列化解決方案。 在瀏覽代碼時,我們可以通過從現有類中生成數據來快速符合新的規則。

手動將 Java 代碼翻譯為 Avro JSON schema 將非常耗時。 相反,我們可以使用可用的庫來自動執行此操作,從而節省時間。

4. Generating Avro Schema Using Avro Reflection API

The first option allowing us to transform the existing Java class to Avro schema quickly is to use the Avro Reflection API. To use this API, we need to make sure that our project depends on the Avro library:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.12.0</version>
</dependency>

4.1. Simple Records

Let’s assume we want to use the ReflectData API for a simple Java record:

record SimpleBankAccount(String bankAccountNumber) {
}

We can use ReflectData‘s singleton instance to generate an org.apache.avro.Schema object for any given Java class. Then, we can call the toString() method of the Schema instance to get the Avro schema as a JSON String.

For validating the generated string against our expectation, we can use JsonUnit:

@Test
void whenConvertingSimpleRecord_thenAvroSchemaIsCorrect() {
    Schema schema = ReflectData.get().getSchema(SimpleBankAccount.class);
    String jsonSchema = schema.toString();

    assertThatJson(jsonSchema).isEqualTo("""
        {
          "type" : "record",
          "name" : "SimpleBankAccount",
          "namespace" : "com.baeldung.apache.avro.model",
          "fields" : [ {
            "name" : "bankAccountNumber",
            "type" : "string"
          } ]
        }
        """);
}

Even though we used a Java record for simplicity, this will work equally well with a plain Java object.

4.2. Nullable Fields

Let’s add another String field to our Java record. We can mark it optional using the @org.apache.avro.reflect.Nullable annotation:

record BankAccountWithNullableField(
    String bankAccountNumber, 
    @Nullable String reference
) {
}

If we repeat the test, we can expect reference‘s nullability to be reflected:

@Test
void whenConvertingRecordWithNullableField_thenAvroSchemaIsCorrect() {
    Schema schema = ReflectData.get().getSchema(BankAccountWithNullableField.class);
    String jsonSchema = schema.toString(true);

    assertThatJson(jsonSchema).isEqualTo("""
        {
          "type" : "record",
          "name" : "BankAccountWithNullableField",
          "namespace" : "com.baeldung.apache.avro.model",
          "fields" : [ {
            "name" : "bankAccountNumber",
            "type" : "string"
          }, {
            "name" : "reference",
            "type" : [ "null", "string" ],
            "default" : null
          } ]
        }
        """);
}

As we can see, applying the @Nullable annotation on the new field made the reference field in the generated schema union null.

4.3. Ignored Fields

The Avro library also gives us the option to ignore certain fields when generating schemas. For example, we don’t want to transmit sensitive information over the wire. To achieve this, it’s enough to use the @AvroIgnore annotation on the particular field:

record BankAccountWithIgnoredField(
    String bankAccountNumber, 
    @AvroIgnore String reference
) {
}

Consequently, the generated schema will match the one from our first example.

4.4. Overriding Field Names

By default, fields in generated schemas are created with names coming directly from Java field names. Although this is the default behavior, it can be tweaked:

record BankAccountWithOverriddenField(
    String bankAccountNumber, 
    @AvroName("bankAccountReference") String reference
) {
}

The schema generated from this version of our record uses bankAccountReference instead of reference.

{
  "type" : "record",
  "name" : "BankAccountWithOverriddenField",
  "namespace" : "com.baeldung.apache.avro.model",
  "fields" : [ {
    "name" : "bankAccountNumber",
    "type" : "string"
  }, {
    "name" : "bankAccountReference",
    "type" : "string"
  } ]
}

4.5. Fields with Multiple Implementations

Sometimes, our class might contain a field whose type is a subtype.

Let’s assume AccountReference is an interface with two implementations — we can stick to Java records for brevity:

interface AccountReference {
    String reference();
}

record PersonalBankAccountReference(
    String reference, 
    String holderName
) implements AccountReference {
}

record BusinessBankAccountReference(
    String reference, 
    String businessEntityId
) implements AccountReference {
}

In our BankAccountWithAbstractField, we indicate the supported implementations of the AccountReference field using the @org.apache.avro.reflect.Union annotation:

record BankAccountWithAbstractField(
    String bankAccountNumber,
    @Union({ PersonalBankAccountReference.class, BusinessBankAccountReference.class }) 
    AccountReference reference
) { 
}

As a result, the generated Avro schema will contain a union allowing the assignment of either of these two classes, rather than limiting us to just one.

{
  "type" : "record",
  "name" : "BankAccountWithAbstractField",
  "namespace" : "com.baeldung.apache.avro.model",
  "fields" : [ {
    "name" : "bankAccountNumber",
    "type" : "string"
  }, {
    "name" : "reference",
    "type" : [ {
      "type" : "record",
      "name" : "PersonalBankAccountReference",
      "namespace" : "com.baeldung.apache.avro.model.BankAccountWithAbstractField",
      "fields" : [ {
        "name" : "holderName",
        "type" : "string"
      }, {
        "name" : "reference",
        "type" : "string"
      } ]
    }, {
      "type" : "record",
      "name" : "BusinessBankAccountReference",
      "namespace" : "com.baeldung.apache.avro.model.BankAccountWithAbstractField",
      "fields" : [ {
        "name" : "businessEntityId",
        "type" : "string"
      }, {
        "name" : "reference",
        "type" : "string"
      } ]
    } ]
  } ]
}

4.6. Logical Types

Avro supports logical types. These are primitive types on the schema level but contain additional hints for the code generator telling what class should be used to represent the particular field.

For example, we can leverage the logical types feature if our model uses temporal fields or UUIDs:

record BankAccountWithLogicalTypes(
    String bankAccountNumber, 
    UUID reference, 
    LocalDateTime expiryDate
) {
}

Additionally, we’ll configure our ReflectData instance, adding the Conversion objects we need. We can create our own Conversions or use the ones coming out of the box:

@Test
void whenConvertingRecordWithLogicalTypes_thenAvroSchemaIsCorrect() {
    ReflectData reflectData = ReflectData.get();
    reflectData.addLogicalTypeConversion(new Conversions.UUIDConversion());
    reflectData.addLogicalTypeConversion(new TimeConversions.LocalTimestampMillisConversion());

    String jsonSchema = reflectData.getSchema(BankAccountWithLogicalTypes.class).toString();

    // verify schema
}

Consequently, when we generate and validate the schema, we’ll notice that the new fields will include a logicalType field:

{
  "type" : "record",
  "name" : "BankAccountWithLogicalTypes",
  "namespace" : "com.baeldung.apache.avro.model",
  "fields" : [ {
    "name" : "bankAccountNumber",
    "type" : "string"
  }, {
    "name" : "expiryDate",
    "type" : {
      "type" : "long",
      "logicalType" : "local-timestamp-millis"
    }
  }, {
    "name" : "reference",
    "type" : {
      "type" : "string",
      "logicalType" : "uuid"
    }
  } ]
}

5. Generating Avro Schema Using Jackson

雖然 Avro 反射 API 有用且應該能夠解決不同的,甚至複雜的需求,但瞭解替代方案始終是有價值的。

在我們的情況下,我們正在使用的庫的替代方案是 Jackson Dataformats Binary 庫,特別是它的 Avro 相關子模塊

首先,讓我們添加 jackson-corejackson-dataformat-avro 依賴項到我們的 pom.xml 中:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.17.2</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-avro</artifactId>
    <version>2.17.2</version>
</dependency>

5.1. Simple Conversions

讓我們通過編寫一個簡單的轉換器來探索 Jackson 能提供什麼,這個實現具有使用已知 Java API 的優勢。事實上,Jackson 是最廣泛使用的庫之一,而直接使用的 Avro API 則比較小眾。

我們將創建 AvroMapperAvroSchemaGenerator 實例,並使用它們來檢索一個 org.apache.avro.Schema 實例。

之後,我們只需調用 toString() 方法,就像在之前的示例中一樣:

@Test
void whenConvertingRecord_thenAvroSchemaIsCorrect() throws JsonMappingException {
    AvroMapper avroMapper = new AvroMapper();
    AvroSchemaGenerator avroSchemaGenerator = new AvroSchemaGenerator();

    avroMapper.acceptJsonFormatVisitor(SimpleBankAccount.class, avroSchemaGenerator);
    Schema schema = avroSchemaGenerator.getGeneratedSchema().getAvroSchema();
    String jsonSchema = schema.toString();

    assertThatJson(jsonSchema).isEqualTo("""
        {
          "type" : "record",
          "name" : "SimpleBankAccount",
          "namespace" : "com.baeldung.apache.avro.model",
          "fields" : [ {
            "name" : "bankAccountNumber",
            "type" : [ "null", "string" ]
          } ]
        }
        """);
}

5.2. Jackson Annotations

如果我們比較了用於 SimpleBankAccount 的兩個生成的模式,我們會注意到一個關鍵的區別:使用 Jackson 生成的模式標記了 bankAccountNumber 字段為可空。這是因為 Jackson 的工作方式與 Avro Reflect 不同。

Jackson 不依賴於反射,並且為了能夠識別要移動到模式中的字段,它需要類具有訪問器。 此外,還請記住,默認行為假設字段是可空的。如果我們不想在模式中讓字段不可為空,則需要使用 @JsonProperty(required = true) 標註它。

讓我們創建一個不同的變體並利用這個標註:

record JacksonBankAccountWithRequiredField(
    @JsonProperty(required = true) String bankAccountNumber
) {
}

由於所有應用於原始 Java 類上的 Jackson 標註都得到了強制執行,因此我們需要仔細檢查轉換結果。

5.3. Logical Types Aware Converter

Jackson,如 Avro Reflection,默認情況下不考慮邏輯類型。 因此,我們需要顯式地啓用此功能。 讓我們通過對 AvroMapperAvroSchemaGenerator 對象進行一些小的調整來實現這一點。

@Test
void whenConvertingRecordWithRequiredField_thenAvroSchemaIsCorrect() throws JsonMappingException {
    AvroMapper avroMapper = AvroMapper.builder()
        .addModule(new AvroJavaTimeModule())
        .build();

    AvroSchemaGenerator avroSchemaGenerator = new AvroSchemaGenerator()
        .enableLogicalTypes();

    avroMapper.acceptJsonFormatVisitor(BankAccountWithLogicalTypes.class, avroSchemaGenerator);
    Schema schema = avroSchemaGenerator.getGeneratedSchema()
        .getAvroSchema();
    String jsonSchema = schema.toString();

    // verify schema
}

通過這些修改,我們將能夠觀察到邏輯類型功能在生成的 Avro 模式中被使用,用於 Temporal 對象。

6. 結論

在本文中,我們展示了各種方法,這些方法允許我們從現有的 Java 類生成 Avro 模式。可以使用標準的 Avro 反射 API,以及帶有二進制 Avro 模塊的 Jackson。

雖然 Avro 的方法和其 API 對更廣泛的受眾來説可能不太為人所知,但它似乎是比使用 Jackson 這樣的更可預測的解決方案,後者如果在我們的主要項目中被集成,可能會導致錯誤。

本文章中的示例並非對 Avro 或 Jackson 提供的可能性進行窮盡性的展示。請查看 GitHub 上的代碼以查看不太常用的功能,或參考這兩個庫的官方文檔。

user avatar
0 位用戶收藏了這個故事!
收藏

發佈 評論

Some HTML is okay.