动态

详情 返回 返回

技術分享 | 某客户業務卡頓分析(Oracle) - 动态 详情

本文為墨天輪數據庫管理服務團隊第96期技術分享,內容原創,作者為技術顧問陳昱,如需轉載請聯繫小墨(VX:modb666)並註明來源。如需查看更多文章可關注【墨天輪】公眾號。

image.png

適用範圍

DB: ALL VERSION

OS: ALL PLATFORM

問題概述

數據庫在7號9:00 左右開始卡頓,採集了對應時間段的AWR報告如下:
–node1
image.png
image.png
image.png
–node2
image.png
image.png
image.png
–node1
image.png
–node2
image.png

有3個EVENT需要注意,分別是

1)其中enq: TX - index contention主要由 apd5vv681f54w 引起。

SQL語句:INSERT INTO A_LM_*****_HOUR ("DATA_DATE", "METER_ID", "ASSET_NO", "MGT_ORG_CODE", "MGT_ORG_NAME", "CUST_ID", "CUST_NO", "CUST_NAME", "CONS_TYPE", "IS_SPECIAL", "UI_ABNOR_TYPE", "ABNOR_DESC_RATE", "ABNOR_DESC_DATA", "ABNOR_DATE", "CREATE_TIME", "IMPO_CONS_TYPE", "IMPO_CONS_SORT", "TMNL_ONLINE_FLAG", "THRES_HOLD") VALUES (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 )

查詢索引分裂對應的索引

select instance_number,current_obj#,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24') between '2025-06-04 09' and '2025-06-04 11' 
and sql_id = 'apd5vv681f54w'
group by instance_number,current_obj#
order by instance_number,cnt;
11-12236
2104979
31313244769254
413132448144496

可以看見 3132447 和 3132448 最嚴重,下面查詢2個索引對象

select owner,object_name,object_type from dba_objects where object_id in (3132447,3132448);

image.png

image.png

建議:修改_assm_high_gsp_threshold 和 _assm_low_gsp_threshold 參數,功能:

  • 控制在索引分裂過程中,搜索可重用的空閒塊(free block) 時的最大拒絕次數閾值。
  • 當空間管理模塊(ASSM)在尋找空閒塊過程中累計的塊拒絕次數達到該閾值時,會跳過繼續搜索,直接擴展新的數據塊,從而減少因反覆掃描高使用率塊帶來的開銷。
alter system set event='43822 trace name context forever,level 1','450502 trace name context forever, level 1',''450503 trace name context forever, level 1' scope=spfile sid='*';
ALTER SYSTEM SET "_assm_low_gsp_threshold"=512 SCOPE=BOTH;
ALTER SYSTEM SET "_assm_high_gsp_threshold"=512 SCOPE=BOTH;

2)其中enq: HW - contention主要由 8dyk8nmwd39aw引起。

SQL語句為:INSERT INTO A_COLL_*****_DAY (MGT_ORG_CODE, TMNL_ASSET_NO, METER_POINT_CNT, POINT_CNT, METER_U_POINT_CNT, U_POINT_CNT, METER_I_POINT_CNT, I_POINT_CNT, METER_P_POINT_CNT, P_POINT_CNT, METER_PF_POINT_CNT, PF_POINT_CNT, METER_PAR_POINT_CNT, PAR_POINT_CNT, METER_I0_POINT_CNT, I0_POINT_CNT, METER_CNT, METER_SUCC_CNT, TMNL_INTE_RATE, U_INST_RATE, I_INST_RATE, P_INST_RATE, PF_INST_RATE, PAR_INST_RATE, I0_INST_RATE, PO_POINT_CNT, IOT_POINT_NO, IOT_POINT_NAME, PRAP_IR_DAY, STAT_DATE) VALUES(:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 , :25 , :26 , :27 , :28 , :29 , :30 )

建議:
1) 將A_COLL_TMNL_ALL_INTE_DAY表的next extent 增大,默認是1M,可以修改為10M,減小大量insert帶來的高水位爭用。
2) 手工對改表進行擴容

3)enq: TX – contention等待鏈條如下。

with ash as
(select /*+ materialize*/ *
    from ashtmp t
   where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number inst_id,
         blocking_session blocking_sid,
         blocking_session_serial# blocking_serial,
         session_id,
         session_serial# session_serial,
         level lvl,
         sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
                             session_serial# || ' ' || sql_id || ' ' ||
                             event,
                             ' <-by ') path,
         connect_by_isleaf isleaf
   from ashtmp
   start with event in ('enq: TX - contention')
   connect by nocycle(prior blocking_session = session_id
                  and prior blocking_session_serial# = session_serial#
                  and prior sample_id = sample_id))
select inst_id,
       blocking_sid,
       blocking_serial,
       lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
       count(*) ash_time,
       path
  from chains
where isleaf = 1
group by inst_id, blocking_sid, blocking_serial, path
order by inst_id, ash_time desc;

image.png

可以發現enq: TX – contention 基本都被row cache lock 阻塞,對 row cache lock 進行分析發現

select inst_id,to_char(min(sample_time),'yyyy-mm-dd hh24:mi:ss') mintim
       ,to_char(max(sample_time),'yyyy-mm-dd hh24:mi:ss') maxtim
       ,event,p1,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11' 
and event in ('row cache lock')
group by inst_id,event,p1
order by inst_id,mintim,cnt;

image.png

DC_SEGMENTS:該緩衝隊列鎖會在段分配的時候發生,觀察持有這個隊列鎖的會話在做什麼。This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.

DC_TABLESPACES:該緩衝隊列鎖會在申請新的extent時產生,當next extent設置的較小,同時申請的較為頻繁時產生。

select instance_number,user_id,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11' 
and event in ('row cache lock')
group by instance_number,user_id
order by instance_number,cnt;
1113
09
11717733
10319
1171238
117 就是 SG_MIS

image.png

用户對應的 quota tablespace 並沒有做限制。

with ash as
(select /*+ materialize*/ *
    from ashtmp t
   where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number instance_number,
         blocking_session blocking_sid,
         blocking_session_serial# blocking_serial,
         session_id,
         session_serial# session_serial,
         level lvl,
         sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
                             session_serial# || ' ' || sql_id || ' ' ||
                             event,
                             ' <-by ') path,
         connect_by_isleaf isleaf
   from ashtmp
   start with event in ('row cache lock')
   connect by nocycle(prior blocking_session = session_id
                  and prior blocking_session_serial# = session_serial#
                  and prior sample_id = sample_id))
select instance_number,
       blocking_sid,
       blocking_serial,
       lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
       count(*) ash_time,
       path
  from chains
where isleaf = 1
group by instance_number, blocking_sid, blocking_serial, path
order by instance_number, ash_time desc;

image.png

以 1 4509,50180 為例進行分析

select to_char(sample_time,'yyyy-mm-dd hh24:mi:ss') tim,session_id,session_serial#,sql_id,event,p1,session_state,blocking_session,
 to_char(sql_exec_start,'yyyy-mm-dd hh24:mi:ss') sql_exec_start,sql_exec_id
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08:30' and '2025-06-04 08:45'
and instance_number = 1
and session_id = 4509
and session_serial#=50180
order by tim;

image.png

image.png

可以發現P1也是5,且SQL一直沒有執行完成。

所以,本質上 enq: TX – contention 被 row cache lock 阻塞,raw cache lock 又因為 enq: HW – contention 引起,查詢HW等待最嚴重的SQL

select sql_id,count(*) cnt
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08' and '2025-06-04 11' 
and event = 'enq: HW - contention'
group by sql_id
order by cnt;
159jx8w4gh1d1q    4
33v43zsn3h19n2    9
4    apd5vv681f54w    19
5    cz87u7n30qvh8    38
68tv7bhm1dvzvv    46
7    chua70mgbmx5j    57
864vb4f762gnaw    131
90w5addar5sg1j    206
10796yx0b9w53cm    520
11    dctqsa2bc5zru    863
12    c5hayahuzby96    1469
135a7451c2ku1j4    2570
143fa019sbh4653    3455
15079s8t9k1jzmu    4015
16    cufvk33jrzrpk        5310
178dyk8nmwd39aw    129508

最嚴重的SQL是 8dyk8nmwd39aw,建議參考2), 同時對於1000以上的也可以適當增加next extent.

問題原因

  1. 大量插入造成的索引分裂
      2)大量插入造成的高水位爭用
  2. 由於enq: HW contention 導致raw cache lock 從而導致 enq: TX contetion,還是由於ENQ: HW引起

解決方案

1) 索引分裂

alter system set event=‘43822 trace name context forever,level 1’,‘450502 trace name context forever, level 1’,‘‘450503 trace name context forever, level 1’ scope=spfile sid=’*';

ALTER SYSTEM SET “\_assm\_low\_gsp\_threshold”=512 SCOPE=BOTH;

ALTER SYSTEM SET "assm\_high\_gsp\_threshold"=512 SCOPE=BOTH;

2) enq: HW

將A\_COLL\_TMNL*****\_DAY表的next extent 增大,默認是1M,可以修改為10M,減小大量insert帶來的高水位爭用。手工對改表進行擴容。


墨天輪從樂知樂享的數據庫技術社區蓄勢出發,全面升級,提供多類型數據庫管理服務。墨天輪數據庫管理服務旨在為用户構建信賴可託付的數據庫環境,併為數據庫廠商提供中立的生態支持。
墨天輪數據庫服務官網:https://www.modb.pro/service

user avatar wuyongyin 头像 hdwang 头像 kangkaidafangdezi 头像 alicloudros 头像 danieldx 头像 zhuyundataflux 头像
点赞 6 用户, 点赞了这篇动态!
点赞

Add a new 评论

Some HTML is okay.