本文為墨天輪數據庫管理服務團隊第96期技術分享,內容原創,作者為技術顧問陳昱,如需轉載請聯繫小墨(VX:modb666)並註明來源。如需查看更多文章可關注【墨天輪】公眾號。
適用範圍
DB: ALL VERSION
OS: ALL PLATFORM
問題概述
數據庫在7號9:00 左右開始卡頓,採集了對應時間段的AWR報告如下:
–node1
–node2
–node1
–node2
有3個EVENT需要注意,分別是
1)其中enq: TX - index contention主要由 apd5vv681f54w 引起。
SQL語句:INSERT INTO A_LM_*****_HOUR ("DATA_DATE", "METER_ID", "ASSET_NO", "MGT_ORG_CODE", "MGT_ORG_NAME", "CUST_ID", "CUST_NO", "CUST_NAME", "CONS_TYPE", "IS_SPECIAL", "UI_ABNOR_TYPE", "ABNOR_DESC_RATE", "ABNOR_DESC_DATA", "ABNOR_DATE", "CREATE_TIME", "IMPO_CONS_TYPE", "IMPO_CONS_SORT", "TMNL_ONLINE_FLAG", "THRES_HOLD") VALUES (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 )
查詢索引分裂對應的索引
select instance_number,current_obj#,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24') between '2025-06-04 09' and '2025-06-04 11'
and sql_id = 'apd5vv681f54w'
group by instance_number,current_obj#
order by instance_number,cnt;
11-12236
2104979
31313244769254
413132448144496
可以看見 3132447 和 3132448 最嚴重,下面查詢2個索引對象
select owner,object_name,object_type from dba_objects where object_id in (3132447,3132448);
建議:修改_assm_high_gsp_threshold 和 _assm_low_gsp_threshold 參數,功能:
- 控制在索引分裂過程中,搜索可重用的空閒塊(free block) 時的最大拒絕次數閾值。
- 當空間管理模塊(ASSM)在尋找空閒塊過程中累計的塊拒絕次數達到該閾值時,會跳過繼續搜索,直接擴展新的數據塊,從而減少因反覆掃描高使用率塊帶來的開銷。
alter system set event='43822 trace name context forever,level 1','450502 trace name context forever, level 1',''450503 trace name context forever, level 1' scope=spfile sid='*';
ALTER SYSTEM SET "_assm_low_gsp_threshold"=512 SCOPE=BOTH;
ALTER SYSTEM SET "_assm_high_gsp_threshold"=512 SCOPE=BOTH;
2)其中enq: HW - contention主要由 8dyk8nmwd39aw引起。
SQL語句為:INSERT INTO A_COLL_*****_DAY (MGT_ORG_CODE, TMNL_ASSET_NO, METER_POINT_CNT, POINT_CNT, METER_U_POINT_CNT, U_POINT_CNT, METER_I_POINT_CNT, I_POINT_CNT, METER_P_POINT_CNT, P_POINT_CNT, METER_PF_POINT_CNT, PF_POINT_CNT, METER_PAR_POINT_CNT, PAR_POINT_CNT, METER_I0_POINT_CNT, I0_POINT_CNT, METER_CNT, METER_SUCC_CNT, TMNL_INTE_RATE, U_INST_RATE, I_INST_RATE, P_INST_RATE, PF_INST_RATE, PAR_INST_RATE, I0_INST_RATE, PO_POINT_CNT, IOT_POINT_NO, IOT_POINT_NAME, PRAP_IR_DAY, STAT_DATE) VALUES(:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 , :25 , :26 , :27 , :28 , :29 , :30 )
建議:
1) 將A_COLL_TMNL_ALL_INTE_DAY表的next extent 增大,默認是1M,可以修改為10M,減小大量insert帶來的高水位爭用。
2) 手工對改表進行擴容
3)enq: TX – contention等待鏈條如下。
with ash as
(select /*+ materialize*/ *
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number inst_id,
blocking_session blocking_sid,
blocking_session_serial# blocking_serial,
session_id,
session_serial# session_serial,
level lvl,
sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
session_serial# || ' ' || sql_id || ' ' ||
event,
' <-by ') path,
connect_by_isleaf isleaf
from ashtmp
start with event in ('enq: TX - contention')
connect by nocycle(prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
and prior sample_id = sample_id))
select inst_id,
blocking_sid,
blocking_serial,
lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
count(*) ash_time,
path
from chains
where isleaf = 1
group by inst_id, blocking_sid, blocking_serial, path
order by inst_id, ash_time desc;
可以發現enq: TX – contention 基本都被row cache lock 阻塞,對 row cache lock 進行分析發現
select inst_id,to_char(min(sample_time),'yyyy-mm-dd hh24:mi:ss') mintim
,to_char(max(sample_time),'yyyy-mm-dd hh24:mi:ss') maxtim
,event,p1,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11'
and event in ('row cache lock')
group by inst_id,event,p1
order by inst_id,mintim,cnt;
DC_SEGMENTS:該緩衝隊列鎖會在段分配的時候發生,觀察持有這個隊列鎖的會話在做什麼。This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
DC_TABLESPACES:該緩衝隊列鎖會在申請新的extent時產生,當next extent設置的較小,同時申請的較為頻繁時產生。
select instance_number,user_id,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11'
and event in ('row cache lock')
group by instance_number,user_id
order by instance_number,cnt;
1113
09
11717733
10319
1171238
117 就是 SG_MIS
用户對應的 quota tablespace 並沒有做限制。
with ash as
(select /*+ materialize*/ *
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number instance_number,
blocking_session blocking_sid,
blocking_session_serial# blocking_serial,
session_id,
session_serial# session_serial,
level lvl,
sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
session_serial# || ' ' || sql_id || ' ' ||
event,
' <-by ') path,
connect_by_isleaf isleaf
from ashtmp
start with event in ('row cache lock')
connect by nocycle(prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
and prior sample_id = sample_id))
select instance_number,
blocking_sid,
blocking_serial,
lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
count(*) ash_time,
path
from chains
where isleaf = 1
group by instance_number, blocking_sid, blocking_serial, path
order by instance_number, ash_time desc;
以 1 4509,50180 為例進行分析
select to_char(sample_time,'yyyy-mm-dd hh24:mi:ss') tim,session_id,session_serial#,sql_id,event,p1,session_state,blocking_session,
to_char(sql_exec_start,'yyyy-mm-dd hh24:mi:ss') sql_exec_start,sql_exec_id
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08:30' and '2025-06-04 08:45'
and instance_number = 1
and session_id = 4509
and session_serial#=50180
order by tim;
可以發現P1也是5,且SQL一直沒有執行完成。
所以,本質上 enq: TX – contention 被 row cache lock 阻塞,raw cache lock 又因為 enq: HW – contention 引起,查詢HW等待最嚴重的SQL
select sql_id,count(*) cnt
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08' and '2025-06-04 11'
and event = 'enq: HW - contention'
group by sql_id
order by cnt;
159jx8w4gh1d1q 4
33v43zsn3h19n2 9
4 apd5vv681f54w 19
5 cz87u7n30qvh8 38
68tv7bhm1dvzvv 46
7 chua70mgbmx5j 57
864vb4f762gnaw 131
90w5addar5sg1j 206
10796yx0b9w53cm 520
11 dctqsa2bc5zru 863
12 c5hayahuzby96 1469
135a7451c2ku1j4 2570
143fa019sbh4653 3455
15079s8t9k1jzmu 4015
16 cufvk33jrzrpk 5310
178dyk8nmwd39aw 129508
最嚴重的SQL是 8dyk8nmwd39aw,建議參考2), 同時對於1000以上的也可以適當增加next extent.
問題原因
- 大量插入造成的索引分裂
2)大量插入造成的高水位爭用 - 由於enq: HW contention 導致raw cache lock 從而導致 enq: TX contetion,還是由於ENQ: HW引起
解決方案
1) 索引分裂
alter system set event=‘43822 trace name context forever,level 1’,‘450502 trace name context forever, level 1’,‘‘450503 trace name context forever, level 1’ scope=spfile sid=’*';
ALTER SYSTEM SET “\_assm\_low\_gsp\_threshold”=512 SCOPE=BOTH;
ALTER SYSTEM SET "assm\_high\_gsp\_threshold"=512 SCOPE=BOTH;
2) enq: HW
將A\_COLL\_TMNL*****\_DAY表的next extent 增大,默認是1M,可以修改為10M,減小大量insert帶來的高水位爭用。手工對改表進行擴容。
墨天輪從樂知樂享的數據庫技術社區蓄勢出發,全面升級,提供多類型數據庫管理服務。墨天輪數據庫管理服務旨在為用户構建信賴可託付的數據庫環境,併為數據庫廠商提供中立的生態支持。
墨天輪數據庫服務官網:https://www.modb.pro/service