Stories

Detail Return Return

技術分享 | 一次新上的SQL引發的CPU飆升(Oracle) - Stories Detail

本文為墨天輪數據庫管理服務團隊第133期技術分享,內容原創,作者為技術顧問李寧,如需轉載請聯繫小墨(VX:modb666)並註明來源。如需查看更多文章可關注【墨天輪】公眾號。

版本架構:Oracle 19c 雙節點RAC+雙節點DG

摘要:記錄一次TOP SQL由於統計信息缺失引發的AAS和CPU突刺,優化器只能基於統計信息來判斷最優執行路徑。如果統計信息缺失或過期,系統隨時可能出現嚴重性能問題。

一、問題背景

2024年11月29號早上,生產營業系統突然收到CPU佔用率超過閾值的告警。通過監控發現,數據庫側的 AAS(Active Session Average)飆升到95,有一條新上線的SQL消耗異常,導致CPU使用率直線拉昇。

圖片

二、問題定位

1、Top SQL定位

從監控(上面圖片)看,左面有一條TOP SQL佔比31%,大概率是元兇,繼續分析分線,該SQL第一次出現時間在 07:15:29,説明是新上線的SQL,從監控中快速定位:

SQL_ID        4cwfmj3fkzrh8
PLAN_HASH_VALUE   2962550836
FIRST_LOAD_TIME   2024-11-29/07:15:29
LAST_LOAD_TIME    2024-11-29 10:35:25

-- sql信息
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN
 WHERE SUBSID = :1
   AND EXPTIME IS NOT NULL
   AND EXPTIME >= SYSDATE
   AND EXPTIME <= :2
   AND CITY = 1117
–查看sql的第一次載入時間
select FIRST\_LOAD\_TIME from v$sql where sql\_id=&1;

2、分析原因

在監控(上面圖片)右側顯示的實時的等待事件,此時佔比最高的是 ON CPU / runqueue,這個等待事件不是select distinct WAIT\_CLASS from v$event\_name;13個類別裏面的,一般可以通過V$session status='ACTIVE' and state!='WAITING'、或者V$ASH中event列為NULL判斷,也是説如果沒有等待事件此時SQL正在使用或等待CPU。到這裏可以斷定問題大概率發生在sql執行流程中的執行階段,可能佔用CPU較高的是sql的排序操作、複雜算子的計算、函數調用等

接着分析SQL的執行計劃

SQL_ID  4cwfmj3fkzrh8, child number 0
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN  WHERE SUBSID = :1  AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2  AND
CITY = 1117

Plan hash value: 881293612

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                           |        |       |  1470 (100)|          |       |       |
|*  1 |  FILTER                                     |                           |        |       |            |          |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN                     |     15 | 13680 |  1470   (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    INDEX RANGE SCAN                         | IDX_RIGHTGOOD_SERVICEGOOD |    239K|       |    28   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:2>=SYSDATE@!)
   2 - filter(("EXPTIME"<=:2 AND "SUBSID"=TO_NUMBER(:1) AND "EXPTIME">=SYSDATE@!))
   3 - access("CITY"=1117)

索引信息

SYS@PANDAe1>@ind %.OJ_RES_GOOD_SYN
Display indexes where table or index name matches %%.OJ_RES_GOOD_SYN%...


TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME                    DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
PANDA                 OJ_RES_GOOD_SYN                          IDX_CS_REC_SUBS_RIGHT_SUBSID      1 SUBSID        <<<< 正確的
                                                    IDX_RIGHTGOOD_CUSTOMERPHONE       1 CUSTOMERPHONE
                                                    IDX_RIGHTGOOD_SERVICEGOOD         1 CITY        <<<< 錯誤的
                                                                                      2 GOODSID
                                                                                      3 SERVICECODE


INDEX_OWNER          TABLE_NAME                     INDEX_NAME                     IDXTYPE    UNIQ STATUS   PART TEMP  H     LFBLKS           NDK   NUM_ROWS       CLUF LAST_ANALYZED       DEGREE VISIBILIT
-------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ ---------
PANDA                OJ_RES_GOOD_SYN                          IDX_CS_REC_SUBS_RIGHT_SUBSID               NORMAL     NO   N/A      YES  N     3     129071       7690240   34905836   34226961 2024-11-29 10:35:23 1      VISIBLE  <<<<<<<<<剛才這個索引是沒有統計信息的
                     OJ_RES_GOOD_SYN                          IDX_RIGHTGOOD_CUSTOMERPHONE                NORMAL     NO   N/A      YES  N     3     148744       6300672   34905829   34425431 2024-11-29 10:35:01 1      VISIBLE
                     OJ_RES_GOOD_SYN

執行計劃使用了索引 IDX\_RIGHTGOOD\_SERVICEGOOD走了了INDEX RANGE SCAN並且謂詞中只通過CITY字段進行了過濾,這個字段是一個分區鍵,選擇性很差。在where條件後面有一個SUBSID選擇新很好並且存在索引IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID

select * from (select CITY,count(*) from PANDA.OJ_RES_GOOD_SYN group by CITY order by count(*) desc)where rownum<=10;
    CITY   COUNT(*)
---------- ----------
       1116   24489838
       1117   11353009
       1118       7236

3 rows selected.

select * from (select SUBSID,count(*) from PANDA.OJ_RES_GOOD_SYN group by SUBSID order by count(*) desc)where rownum<=10;

    SUBSID   COUNT(*)
---------- ----------
3.1639E+12       2944
3.1641E+12       2855
3.1639E+12       2476
3.1639E+12       2204
3.1421E+12       2194
3.1640E+12       2143
3.1621E+12       2099
3.1641E+12       2015
3.1640E+12       1939
3.1640E+12       1909

10 rows selected.

3、檢查統計信息

執行計劃異常,有好的索引不走,檢查涉及表的統計信息,發現統計信息過期 (STALE\_STATS=YES/NULL忘記了),這意味着優化器可能生成了不合理的執行計劃。

SELECT owner, table_name, partition_name, stale_stats, last_analyzed
FROM   dba_tab_statistics
WHERE  table_name   = UPPER('OJ_RES_GOOD_SYN')
  AND  owner        = 'PANDA';

OWNER      TABLE_NAME           PARTITION_NAME       STALE LAST_ANALYZED
---------- -------------------- -------------------- ----- ----------------
OWNER       TABLE_NAME                       PARTITION_NAME                  29-AUG-20

三、處理過程

之前OJ\_RES\_GOOD\_SYN沒有統計信息,優化器無法正確評估成本。收集表統計信息

exec DBMS_STATS.GATHER_TABLE_STATS (
    ownname => 'PANDA', 
    tabname => 'OJ_RES_GOOD_SYN',
    cascade => true,
    estimate_percent => dbms_stats.auto_sample_size,
    method_opt => 'FOR TABLE FOR ALL COLUMNS SIZE REPEAT',
    degree => 8,
    no_invalidate => false
);

收集後,執行計劃重新生成,正確走上IDX\_CS\_REC\_SUBS\_RIGHT\_SUBSID索引,也走了CITY的分區裁剪

SQL_ID  4cwfmj3fkzrh8, child number 3
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID, 
CUSTOMERPHONE, SERVICECODE,  SERVICESTATUS, RECEIVETIME, EFFTIME, 
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,  
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE, 
SELLERMEMO FROM OJ_RES_GOOD_SYN  WHERE SUBSID = :1  AND 
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2  AND 
CITY = 1117

Plan hash value: 2962550836

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                              |       |       |   134 (100)|          |       |       |
|*  1 |  FILTER                                     |                              |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |                              |     1 |   912 |   134   (0)| 00:00:01 |   131 |   196 |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN                        |     1 |   912 |   134   (0)| 00:00:01 |   131 |   196 |
|*  4 |     INDEX RANGE SCAN                        | IDX_CS_REC_SUBS_RIGHT_SUBSID |    32 |       |   133   (0)| 00:00:01 |   131 |   196 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:2>=SYSDATE@!)
   3 - filter(("CITY"=1117 AND "EXPTIME"<=:2 AND "EXPTIME">=SYSDATE@!))
   4 - access("SUBSID"=TO_NUMBER(:1))

四、效果驗證

統計信息收集後,執行計劃恢復正確,CPU瞬間回落,系統恢復正常。

SQL性能指標對比:

計劃 LIO/Exec(塊次) CPU/Exec(secs) Elapsed/Exec(secs)
壞計劃 881293612 2,112,709.98 38.84 43.73
好計劃 2962550836 173.73 0.00 0.01
                                                                                              Avg                 Avg
Snapshot                                          Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Time         INSTANCE_NUMBER    Execs            Per Exec            Per Exec            Per Exec            Per Exec
------------ --------------- -------- ------------------- ------------------- ------------------- -------------------
29-NOV 07:30               1        6        2,465,321.50          108,749.83               18.59               49.14
29-NOV 08:00               1      257        8,457,635.70            1,496.14               34.31               36.93
29-NOV 08:30               1      814                0.00                0.00               36.69               36.74
29-NOV 09:00               1    1,329        2,255,022.27                0.00               39.74               39.84
29-NOV 09:30               1    1,706                0.00                0.00               42.84               42.99
29-NOV 10:00               1    1,757        1,434,805.84                0.00               45.29               45.58
29-NOV 10:30               1      102          176,184.53                7.97               54.41               54.88
29-NOV 10:30               1    1,669              173.73                5.38                0.00                0.01
                             -------- ------------------- ------------------- ------------------- -------------------
avg                                          1,848,642.95           13,782.42               33.98               38.26
sum                             7,640

                                                             Per-Plan Execution Statistics Over Time
                                                                                                                             Avg                 Avg
      Plan Snapshot                                         Avg Rows             Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Hash Value Time         INSTANCE_NUMBER    Execs            Per Exec            Per Exec            Per Exec            Per Exec            Per Exec
---------- ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- -------------------
 881293612 29-NOV 07:30               1        6                0.00        2,465,321.50          108,749.83               18.59               49.14
           29-NOV 08:00               1      257                0.00        8,457,635.70            1,496.14               34.31               36.93
           29-NOV 08:30               1      814                0.01                0.00                0.00               36.69               36.74
           29-NOV 09:00               1    1,329                0.02        2,255,022.27                0.00               39.74               39.84
           29-NOV 09:30               1    1,706                0.01                0.00                0.00               42.84               42.99
           29-NOV 10:00               1    1,757                0.01        1,434,805.84                0.00               45.29               45.58
           29-NOV 10:30               1      102                0.02          176,184.53                7.97               54.41               54.88
**********                              -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg                                                             0.01        2,112,709.98           15,750.56               38.84               43.73
sum                                        5,971

2962550836 29-NOV 10:30               1    1,669                0.01              173.73                5.38                0.00                0.01
**********                              -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg                                                             0.01              173.73                5.38                0.00                0.01
sum 

墨天輪從樂知樂享的數據庫技術社區蓄勢出發,全面升級,提供多類型數據庫管理服務。墨天輪數據庫管理服務旨在為用户構建信賴可託付的數據庫環境,併為數據庫廠商提供中立的生態支持。
墨天輪數據庫服務官網:https://www.modb.pro/service

user avatar feedalyze Avatar xiaolvshikong Avatar xiaoyi_ces Avatar
Favorites 3 users favorite the story!
Favorites

Add a new Comments

Some HTML is okay.