ASP.Net之筆記11-05（Repeater 分頁）詳情 - transformer,深度學習,人工智能,三元組,歸一化,JavaScript,前端開發架構設計師之光博客

ASP.Net之筆記11-05（Repeater 分頁）_#transformer

ASP.Net之筆記11-05（Repeater 分頁）_歸一化_02 : positional encoding

ASP.Net之筆記11-05（Repeater 分頁）_#人工智能_03 : triplet encoding

ASP.Net之筆記11-05（Repeater 分頁）_歸一化_04 : subject encoding

ASP.Net之筆記11-05（Repeater 分頁）_#深度學習_05 : object encoding

Relation Transformer (RelTR), to directly predict a fixed-size set of < ASP.Net之筆記11-05（Repeater 分頁）_#人工智能_06 − ASP.Net之筆記11-05（Repeater 分頁）_歸一化_07 −

This entity detection framework is built upon the standard Transformer encoder-decoder architecture.

First, a CNN backbone generates a feature map ASP.Net之筆記11-05（Repeater 分頁）_三元組_09

With the self-attention mechanism, the encoder computes a new feature context ASP.Net之筆記11-05（Repeater 分頁）_#人工智能_10 using the flatted and fixed positional encodings ASP.Net之筆記11-05（Repeater 分頁）_三元組_12 .

The decoder transforms ASP.Net之筆記11-05（Repeater 分頁）_#深度學習_13 entity queries into the entity representations

( ASP.Net之筆記11-05（Repeater 分頁）_#深度學習_13 entity queries )

RelTR has an encoder-decoder architecture, which directly predicts ASP.Net之筆記11-05（Repeater 分頁）_#transformer_17

The entity decoder capturing the entity representations from DETR,

The triplet decoder with the subject and object branches.

Given ,
the triplet decoder layer reasons about the feature context ASP.Net之筆記11-05（Repeater 分頁）_#深度學習_19 and entity representations from the entity decoder layer
to directly output the information of triplets
without inferring the possible predicates between all entity pairs.

即，我認為 the triplet decoder layer 輸入大概有：

the feature context
entity representations
subject entity queries
object entity queries
subject encodings
object encodings
triplet encodings

其中，注意力機制，是計算所有元素兩兩之間的加權和，與位置、順序無關（輸出的順序可能不同），即，改變輸入序列中元素的順序，不會改變注意力層的輸出結果（嚴格來説是會改變輸出順序，但輸出集合的內容不變）

故而，引入了 triplet encoding、subject encoding、object encoding；

則，根據理解，與上圖對應的相關計算如下（CSA、DVA、DEA）：

ASP.Net之筆記11-05（Repeater 分頁）_歸一化_29

ASP.Net之筆記11-05（Repeater 分頁）_#transformer_30

ASP.Net之筆記11-05（Repeater 分頁）_三元組_31

3.2.5 Final Inference

A complete triplet includes the predicate label and the class labels as well as the bounding box coordinates of the subject and object.

一個完整的三元組包含謂語標籤、主體與客體的類別標籤，以及它們的邊界框座標。

The subject representations ASP.Net之筆記11-05（Repeater 分頁）_#transformer_32 and object representations

來自解碼器最後一層的主體表徵 ASP.Net之筆記11-05（Repeater 分頁）_#transformer_32 和客體表徵

We utilize two independent feed-forward networks with the same structure to predict the height, width, and normalized center coordinates of subject and object boxes.

我們採用兩個結構相同的獨立前饋網絡，分別預測主體和客體邊界框的高度、寬度及歸一化中心座標。

在DVA的時候，輸出了注意力熱圖，通過計算後，作為空間特徵 ASP.Net之筆記11-05（Repeater 分頁）_#transformer_36 ，共同參與謂詞的分類；