本文為墨天輪數據庫管理服務團隊第133期技術分享,內容原創,作者為技術顧問李寧,如需轉載請聯繫小墨(VX:modb666)並註明來源。如需查看更多文章可關注【墨天輪】公眾號。
版本架構:Oracle 19c 雙節點RAC+雙節點DG
摘要:記錄一次TOP SQL由於統計信息缺失引發的AAS和CPU突刺,優化器只能基於統計信息來判斷最優執行路徑。如果統計信息缺失或過期,系統隨時可能出現嚴重性能問題。
一、問題背景
2024年11月29號早上,生產營業系統突然收到CPU佔用率超過閾值的告警。通過監控發現,數據庫側的 AAS(Active Session Average)飆升到95,有一條新上線的SQL消耗異常,導致CPU使用率直線拉昇。
二、問題定位
1、Top SQL定位
從監控(上面圖片)看,左面有一條TOP SQL佔比31%,大概率是元兇,繼續分析分線,該SQL第一次出現時間在 07:15:29,説明是新上線的SQL,從監控中快速定位:
SQL_ID 4cwfmj3fkzrh8
PLAN_HASH_VALUE 2962550836
FIRST_LOAD_TIME 2024-11-29/07:15:29
LAST_LOAD_TIME 2024-11-29 10:35:25
-- sql信息
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN
WHERE SUBSID = :1
AND EXPTIME IS NOT NULL
AND EXPTIME >= SYSDATE
AND EXPTIME <= :2
AND CITY = 1117
–查看sql的第一次載入時間
select FIRST\_LOAD\_TIME from v$sql where sql\_id=&1;
2、分析原因
在監控(上面圖片)右側顯示的實時的等待事件,此時佔比最高的是 ON CPU / runqueue,這個等待事件不是select distinct WAIT\_CLASS from v$event\_name;13個類別裏面的,一般可以通過V$session status='ACTIVE' and state!='WAITING'、或者V$ASH中event列為NULL判斷,也是説如果沒有等待事件此時SQL正在使用或等待CPU。到這裏可以斷定問題大概率發生在sql執行流程中的執行階段,可能佔用CPU較高的是sql的排序操作、複雜算子的計算、函數調用等
接着分析SQL的執行計劃
SQL_ID 4cwfmj3fkzrh8, child number 0
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND
CITY = 1117
Plan hash value: 881293612
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1470 (100)| | | |
|* 1 | FILTER | | | | | | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 15 | 13680 | 1470 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_RIGHTGOOD_SERVICEGOOD | 239K| | 28 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:2>=SYSDATE@!)
2 - filter(("EXPTIME"<=:2 AND "SUBSID"=TO_NUMBER(:1) AND "EXPTIME">=SYSDATE@!))
3 - access("CITY"=1117)
索引信息
SYS@PANDAe1>@ind %.OJ_RES_GOOD_SYN
Display indexes where table or index name matches %%.OJ_RES_GOOD_SYN%...
TABLE_OWNER TABLE_NAME INDEX_NAME POS# COLUMN_NAME DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID 1 SUBSID <<<< 正確的
IDX_RIGHTGOOD_CUSTOMERPHONE 1 CUSTOMERPHONE
IDX_RIGHTGOOD_SERVICEGOOD 1 CITY <<<< 錯誤的
2 GOODSID
3 SERVICECODE
INDEX_OWNER TABLE_NAME INDEX_NAME IDXTYPE UNIQ STATUS PART TEMP H LFBLKS NDK NUM_ROWS CLUF LAST_ANALYZED DEGREE VISIBILIT
-------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ ---------
PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID NORMAL NO N/A YES N 3 129071 7690240 34905836 34226961 2024-11-29 10:35:23 1 VISIBLE <<<<<<<<<剛才這個索引是沒有統計信息的
OJ_RES_GOOD_SYN IDX_RIGHTGOOD_CUSTOMERPHONE NORMAL NO N/A YES N 3 148744 6300672 34905829 34425431 2024-11-29 10:35:01 1 VISIBLE
OJ_RES_GOOD_SYN
執行計劃使用了索引 IDX\_RIGHTGOOD\_SERVICEGOOD走了了INDEX RANGE SCAN並且謂詞中只通過CITY字段進行了過濾,這個字段是一個分區鍵,選擇性很差。在where條件後面有一個SUBSID選擇新很好並且存在索引IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID
select * from (select CITY,count(*) from PANDA.OJ_RES_GOOD_SYN group by CITY order by count(*) desc)where rownum<=10;
CITY COUNT(*)
---------- ----------
1116 24489838
1117 11353009
1118 7236
3 rows selected.
select * from (select SUBSID,count(*) from PANDA.OJ_RES_GOOD_SYN group by SUBSID order by count(*) desc)where rownum<=10;
SUBSID COUNT(*)
---------- ----------
3.1639E+12 2944
3.1641E+12 2855
3.1639E+12 2476
3.1639E+12 2204
3.1421E+12 2194
3.1640E+12 2143
3.1621E+12 2099
3.1641E+12 2015
3.1640E+12 1939
3.1640E+12 1909
10 rows selected.
3、檢查統計信息
執行計劃異常,有好的索引不走,檢查涉及表的統計信息,發現統計信息過期 (STALE\_STATS=YES/NULL忘記了),這意味着優化器可能生成了不合理的執行計劃。
SELECT owner, table_name, partition_name, stale_stats, last_analyzed
FROM dba_tab_statistics
WHERE table_name = UPPER('OJ_RES_GOOD_SYN')
AND owner = 'PANDA';
OWNER TABLE_NAME PARTITION_NAME STALE LAST_ANALYZED
---------- -------------------- -------------------- ----- ----------------
OWNER TABLE_NAME PARTITION_NAME 29-AUG-20
三、處理過程
之前OJ\_RES\_GOOD\_SYN沒有統計信息,優化器無法正確評估成本。收集表統計信息
exec DBMS_STATS.GATHER_TABLE_STATS (
ownname => 'PANDA',
tabname => 'OJ_RES_GOOD_SYN',
cascade => true,
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'FOR TABLE FOR ALL COLUMNS SIZE REPEAT',
degree => 8,
no_invalidate => false
);
收集後,執行計劃重新生成,正確走上IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID索引,也走了CITY的分區裁剪
SQL_ID 4cwfmj3fkzrh8, child number 3
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND
CITY = 1117
Plan hash value: 2962550836
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 134 (100)| | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR | | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 |
|* 3 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 |
|* 4 | INDEX RANGE SCAN | IDX_CS_REC_SUBS_RIGHT_SUBSID | 32 | | 133 (0)| 00:00:01 | 131 | 196 |
--------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:2>=SYSDATE@!)
3 - filter(("CITY"=1117 AND "EXPTIME"<=:2 AND "EXPTIME">=SYSDATE@!))
4 - access("SUBSID"=TO_NUMBER(:1))
四、效果驗證
統計信息收集後,執行計劃恢復正確,CPU瞬間回落,系統恢復正常。
SQL性能指標對比:
| 計劃 | LIO/Exec(塊次) | CPU/Exec(secs) | Elapsed/Exec(secs) |
|---|---|---|---|
| 壞計劃 881293612 | 2,112,709.98 | 38.84 | 43.73 |
| 好計劃 2962550836 | 173.73 | 0.00 | 0.01 |
Avg Avg
Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs)
Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec
------------ --------------- -------- ------------------- ------------------- ------------------- -------------------
29-NOV 07:30 1 6 2,465,321.50 108,749.83 18.59 49.14
29-NOV 08:00 1 257 8,457,635.70 1,496.14 34.31 36.93
29-NOV 08:30 1 814 0.00 0.00 36.69 36.74
29-NOV 09:00 1 1,329 2,255,022.27 0.00 39.74 39.84
29-NOV 09:30 1 1,706 0.00 0.00 42.84 42.99
29-NOV 10:00 1 1,757 1,434,805.84 0.00 45.29 45.58
29-NOV 10:30 1 102 176,184.53 7.97 54.41 54.88
29-NOV 10:30 1 1,669 173.73 5.38 0.00 0.01
-------- ------------------- ------------------- ------------------- -------------------
avg 1,848,642.95 13,782.42 33.98 38.26
sum 7,640
Per-Plan Execution Statistics Over Time
Avg Avg
Plan Snapshot Avg Rows Avg LIO Avg PIO CPU (secs) Elapsed (secs)
Hash Value Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec Per Exec
---------- ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- -------------------
881293612 29-NOV 07:30 1 6 0.00 2,465,321.50 108,749.83 18.59 49.14
29-NOV 08:00 1 257 0.00 8,457,635.70 1,496.14 34.31 36.93
29-NOV 08:30 1 814 0.01 0.00 0.00 36.69 36.74
29-NOV 09:00 1 1,329 0.02 2,255,022.27 0.00 39.74 39.84
29-NOV 09:30 1 1,706 0.01 0.00 0.00 42.84 42.99
29-NOV 10:00 1 1,757 0.01 1,434,805.84 0.00 45.29 45.58
29-NOV 10:30 1 102 0.02 176,184.53 7.97 54.41 54.88
********** -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg 0.01 2,112,709.98 15,750.56 38.84 43.73
sum 5,971
2962550836 29-NOV 10:30 1 1,669 0.01 173.73 5.38 0.00 0.01
********** -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg 0.01 173.73 5.38 0.00 0.01
sum
墨天輪從樂知樂享的數據庫技術社區蓄勢出發,全面升級,提供多類型數據庫管理服務。墨天輪數據庫管理服務旨在為用户構建信賴可託付的數據庫環境,併為數據庫廠商提供中立的生態支持。
墨天輪數據庫服務官網:https://www.modb.pro/service