博客 / 詳情

返回

特性介紹 | MySQL select count(*) 、count(1)、count(列) 詳解(1):概念及區別

本文首發於 2020-05-05 21:55:15

一、前言

從接觸MySQL開始斷斷續續的看過一些文章,對count()操作眾説紛紜,其中分歧點主要在於count(1)count(*)哪個效率高,有説count(1)count(*)快的(這種説法更普遍),有説二者一樣快的。個人理解這兩種行為可能適用於的是不同的版本,我只關心較新的MySQL版本是什麼行為,詳見下文。

二、含義

首先,先説明一下常見count()操作及含義:

count(*):計算包括NULL值在內的行數,SQL92定義的標準統計行數的語法。

count(1):計算包括NULL值在內的行數,其中的1是恆真表達式。

count(列名):計算指定列的行數,但不包含NULL值。

三、具體區別

MySQL手冊中相關描述如下:

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly.COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

官方這段描述要點如下:

  1. InnoDB是事務引擎,支持MVCC,併發事務可能同時“看到”不同的行數,所以,InnoDB不保留表中的行數SELECT COUNT(*)語句只計算當前事務可見的行數。
  2. 在MySQL 5.7.18之前,InnoDB通過掃描聚集索引處理SELECT COUNT(*)語句。從MySQL 5.7.18開始,InnoDB通過遍歷最小的可用二級索引來處理SELECT COUNT(*)語句,除非索引或優化器明確指示使用不同的索引。如果不存在二級索引,則掃描聚集索引。這樣的設計單從 IO 的角度就節省了很多開銷。
  3. InnoDB以同樣的方式處理SELECT COUNT(*)SELECT COUNT(1)操作,沒有性能差異。 因此,建議使用符合SQL標準的count(*)
  4. 對於MyISAM表,由於MyISAM引擎存儲了精確的行數,因此,如果SELECT COUNT(*)語句不包含WHERE子句,則會很快返回。這個很好理解,如果帶了where條件,就需要掃表了。
  5. 如果索引記錄不完全在緩衝池中,則處理SELECT(*)語句需要一些時間。為了更快的計數,您可以創建一個計數器表,並讓您的應用程序按插入和刪除操作更新它。然而,這種方法在同一計數器表中啓動成千上萬個併發事務的情況下,可能無法很好地擴展。如果一個近似的行數足夠,可以使用SHOW TABLE STATUS查詢行數。

到這裏我們明白了 count(*)count(1) 本質上面其實是一樣的,那麼 count(column) 又是怎麼回事呢?

count(column) 也是會遍歷整張表,但是不同的是它會拿到 column 的值以後判斷是否為空,然後再進行累加,那麼如果針對主鍵需要解析內容,如果是二級索引需要再次根據主鍵獲取內容,則要多一次 IO 操作,所以 count(column) 的性能肯定不如前兩者,如果按照效率比較的話:count(*)=count(1)>count(primary key)>count(非主鍵column)

四、建議

基於以上描述,如果要查詢innodb存儲引擎的表的總行數,有如下建議:

  1. 若僅僅是想獲取大概的行數,建議使用show table status或查詢information_schema.tables

    mysql> use db6;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    mysql> show tables;
    +---------------+
    | Tables_in_db6 |
    +---------------+
    | t1            |
  2. row in set (0.01 sec)

    mysql> select count(*) from t1;
    count(*)
    2
  3. row in set (0.00 sec)

    mysql> show table status\G
    1. row **

        Name: t1
      Engine: InnoDB
     Version: 10

    Row_format: Dynamic

        Rows: 2

    Avg_row_length: 8192
    Data_length: 16384
    Max_data_length: 0
    Index_length: 0
    Data_free: 0
    Auto_increment: NULL
    Create_time: 2020-04-21 12:00:44
    Update_time: NULL
    Check_time: NULL
    Collation: utf8mb4_general_ci

    Checksum: NULL

    Create_options:

     Comment:
  4. row in set (0.00 sec)

    mysql> select * from information_schema.tables where table_name = 't1'\G
    1. row **
    TABLE_CATALOG: def
    TABLE_SCHEMA: db6
    TABLE_NAME: t1
    TABLE_TYPE: BASE TABLE

      ENGINE: InnoDB
     VERSION: 10

    ROW_FORMAT: Dynamic
    TABLE_ROWS: 2
    AVG_ROW_LENGTH: 8192
    DATA_LENGTH: 16384
    MAX_DATA_LENGTH: 0
    INDEX_LENGTH: 0
    DATA_FREE: 0
    AUTO_INCREMENT: NULL
    CREATE_TIME: 2020-04-21 12:00:44
    UPDATE_TIME: NULL
    CHECK_TIME: NULL
    TABLE_COLLATION: utf8mb4_general_ci

    CHECKSUM: NULL

    CREATE_OPTIONS:
    TABLE_COMMENT:

  5. row in set (0.00 sec)

  6. 反之,如果必須要獲取準確的總行數,建議:

    1) 創建一個計數器表,並讓您的應用程序按插入和刪除操作更新它。
    2) 若業務插入和刪除相對較少,也可以考慮緩存到 redis。

篇幅有限,深入驗證、源碼分析將在下一篇文章中介紹。


歡迎關注我的微信公眾號【數據庫內核】:分享主流開源數據庫和存儲引擎相關技術。

歡迎關注公眾號數據庫內核

標題 網址
GitHub https://dbkernel.github.io
知乎 https://www.zhihu.com/people/...
思否(SegmentFault) https://segmentfault.com/u/db...
掘金 https://juejin.im/user/5e9d3e...
CSDN https://blog.csdn.net/dbkernel
博客園(cnblogs) https://www.cnblogs.com/dbkernel
user avatar
0 位用戶收藏了這個故事!

發佈 評論

Some HTML is okay.