ASP.Net之筆記11-05(Repeater 分頁)_#transformer


ASP.Net之筆記11-05(Repeater 分頁)_歸一化_02: positional encoding

ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_03: triplet encoding

ASP.Net之筆記11-05(Repeater 分頁)_歸一化_04: subject encoding

ASP.Net之筆記11-05(Repeater 分頁)_#深度學習_05: object encoding


Relation Transformer (RelTR), to directly predict a fixed-size set of < ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_06ASP.Net之筆記11-05(Repeater 分頁)_歸一化_07ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_08

This entity detection framework is built upon the standard Transformer encoder-decoder architecture.

First, a CNN backbone generates a feature map ASP.Net之筆記11-05(Repeater 分頁)_三元組_09

With the self-attention mechanism, the encoder computes a new feature context ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_10 using the flatted ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_11 and fixed positional encodings ASP.Net之筆記11-05(Repeater 分頁)_三元組_12.


The decoder transforms ASP.Net之筆記11-05(Repeater 分頁)_#深度學習_13 entity queries into the entity representations ASP.Net之筆記11-05(Repeater 分頁)_#transformer_14

(ASP.Net之筆記11-05(Repeater 分頁)_#深度學習_13 entity queries ASP.Net之筆記11-05(Repeater 分頁)_#transformer_14)


RelTR has an encoder-decoder architecture, which directly predicts ASP.Net之筆記11-05(Repeater 分頁)_#transformer_17

The entity decoder capturing the entity representations from DETR,

The triplet decoder with the subject and object branches.


Given ASP.Net之筆記11-05(Repeater 分頁)_#transformer_18,
the triplet decoder layer reasons about the feature context ASP.Net之筆記11-05(Repeater 分頁)_#深度學習_19 and entity representations ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_20 from the entity decoder layer
to directly output the information of ASP.Net之筆記11-05(Repeater 分頁)_#transformer_17 triplets
without inferring the possible predicates between all entity pairs.


即,我認為 the triplet decoder layer 輸入大概有:

  1. the feature context ASP.Net之筆記11-05(Repeater 分頁)_#transformer_22
  2. entity representations ASP.Net之筆記11-05(Repeater 分頁)_三元組_23
  3. subject entity queries ASP.Net之筆記11-05(Repeater 分頁)_#深度學習_24
  4. object entity queries ASP.Net之筆記11-05(Repeater 分頁)_歸一化_25
  5. subject encodings ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_26
  6. object encodings ASP.Net之筆記11-05(Repeater 分頁)_#transformer_27
  7. triplet encodings ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_28

其中,注意力機制,是計算所有元素兩兩之間的加權和,與位置、順序無關(輸出的順序可能不同),即,改變輸入序列中元素的順序,不會改變注意力層的輸出結果(嚴格來説是會改變輸出順序,但輸出集合的內容不變)

故而,引入了 triplet encoding、subject encoding、object encoding;

則,根據理解,與上圖對應的相關計算如下(CSA、DVA、DEA):

ASP.Net之筆記11-05(Repeater 分頁)_歸一化_29


ASP.Net之筆記11-05(Repeater 分頁)_#transformer_30


ASP.Net之筆記11-05(Repeater 分頁)_三元組_31


3.2.5 Final Inference

A complete triplet includes the predicate label and the class labels as well as the bounding box coordinates of the subject and object.

一個完整的三元組包含謂語標籤、主體與客體的類別標籤,以及它們的邊界框座標。


The subject representations ASP.Net之筆記11-05(Repeater 分頁)_#transformer_32 and object representations ASP.Net之筆記11-05(Repeater 分頁)_#transformer_33

來自解碼器最後一層的主體表徵 ASP.Net之筆記11-05(Repeater 分頁)_#transformer_32 和客體表徵 ASP.Net之筆記11-05(Repeater 分頁)_#transformer_33

We utilize two independent feed-forward networks with the same structure to predict the height, width, and normalized center coordinates of subject and object boxes.

我們採用兩個結構相同的獨立前饋網絡,分別預測主體和客體邊界框的高度、寬度及歸一化中心座標。


在DVA的時候,輸出了注意力熱圖,通過計算後,作為空間特徵 ASP.Net之筆記11-05(Repeater 分頁)_#transformer_36,共同參與謂詞的分類;

ASP.Net之筆記11-05(Repeater 分頁)_歸一化_37


ASP.Net之筆記11-05(Repeater 分頁)_#人工智能_38