人工智能之數據分析 numpy：第七章數組迭代排序篩選詳情 - 迭代,python,數組,Python,後端開發,yyds乾貨盤點咚咚王哲博客

人工智能之數據分析 numpy

第七章數組迭代排序篩選

(文章目錄)

前言

在 NumPy 中，數組的迭代、排序與篩選是數據處理中的三大基礎操作。雖然 NumPy 強調向量化操作（避免顯式 Python 循環以提升性能），但在某些場景下仍需對數組進行迭代（如逐行/逐元素處理）。本文將系統講解這三類操作，並結合實際示例説明如何高效使用。

一、數組迭代（Iteration）

⚠️ 原則：儘量避免顯式 for 循環！優先使用向量化操作。

但若必須迭代，NumPy 提供了多種方式：

1. 一維數組：直接 for 循環

import numpy as np

a = np.array([1, 2, 3])
for x in a:
    print(x)  # 1, 2, 3

2. 多維數組：默認按第一維迭代（逐“行”）

b = np.array([[1, 2], [3, 4]])
for row in b:
    print(row)  # [1 2], [3 4]

3. 逐元素迭代：`np.nditer()`

適用於任意維度，內存高效（支持 C/Fortran 順序）。

c = np.array([[1, 2], [3, 4]])

# 只讀迭代
for x in np.nditer(c):
    print(x, end=' ')  # 1 2 3 4

# 可寫迭代（修改原數組）
with np.nditer(c, op_flags=['readwrite']) as it:
    for x in it:
        x[...] = x * 2
print(c)  # [[2 4] [6 8]]

4. 同時迭代多個數組（廣播兼容）

a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

for x, y in np.nditer([a, b]):
    print(x, y)  # (1,10), (2,20), (3,30)

5. 獲取索引：`np.ndenumerate()`

d = np.array([[10, 20], [30, 40]])
for index, value in np.ndenumerate(d):
    print(index, value)
# (0,0) 10
# (0,1) 20
# (1,0) 30
# (1,1) 40

✅ 最佳實踐：除非邏輯複雜無法向量化，否則不要用 for 循環處理 NumPy 數組！

二、排序（Sorting）——回顧與進階

1. 基礎排序

arr = np.array([3, 1, 4, 1, 5])

# 返回排序後新數組
sorted_arr = np.sort(arr)

# 原地排序
arr.sort()

# 獲取排序索引
indices = np.argsort(arr)  # [1 3 0 2 4]

2. 多維排序（指定軸）

mat = np.array([[3, 1], [2, 4]])

# 按行排序（每行內部排）
np.sort(mat, axis=1)  # [[1 3], [2 4]]

# 按列排序（每列內部排）
np.sort(mat, axis=0)  # [[2 1], [3 4]]

3. 結構化數組排序（按字段）

dt = np.dtype([('name', 'U10'), ('score', 'i4')])
students = np.array([('Alice', 85), ('Bob', 90), ('Charlie', 78)], dtype=dt)

# 按 score 排序
sorted_students = np.sort(students, order='score')
print(sorted_students)  
# [('Charlie', 78) ('Alice', 85) ('Bob', 90)]

三、篩選（Filtering）——條件選擇

1. 布爾索引（最常用）

data = np.array([10, 20, 30, 40, 50])

# 單條件
filtered = data[data > 30]  # [40 50]

# 多條件（注意括號！）
filtered = data[(data > 20) & (data < 50)]  # [30 40]

# 非（not）
filtered = data[~(data == 30)]  # [10 20 40 50]

2. 使用 `np.where()` 篩選索引

# 返回滿足條件的索引
indices = np.where(data > 30)  # (array([3, 4]),)
values = data[indices]         # [40 50]

# 三元選擇（類似 if-else）
result = np.where(data > 30, data, -1)  # [ -1  -1  -1  40  50]

3. 花式索引（Fancy Indexing）

# 按指定位置篩選
positions = [0, 2, 4]
selected = data[positions]  # [10 30 50]

四、綜合實戰：迭代 + 排序 + 篩選

📌 場景：處理學生成績表（結構化數據）

# 創建結構化數組：姓名、數學、英語成績
dt = np.dtype([('name', 'U10'), ('math', 'f4'), ('english', 'f4')])
scores = np.array([
    ('Alice', 88, 92),
    ('Bob', 75, 85),
    ('Charlie', 95, 88),
    ('Diana', 60, 90)
], dtype=dt)

# 1️⃣ 篩選：找出數學 > 80 的學生
good_math = scores[scores['math'] > 80]
print("數學優秀:\n", good_math)

# 2️⃣ 排序：按總分降序排列
total = scores['math'] + scores['english']
sorted_indices = np.argsort(-total)  # 負號實現降序
ranked = scores[sorted_indices]
print("排名:\n", ranked)

# 3️⃣ 迭代：打印每位學生的總分（僅用於演示，實際應向量化）
print("總分列表:")
for student in scores:
    print(f"{student['name']}: {student['math'] + student['english']}")

# ✅ 更高效寫法（無循環）：
print("總分（向量化）:", scores['math'] + scores['english'])

輸出：

數學優秀:
 [('Alice', 88., 92.) ('Charlie', 95., 88.)]
排名:
 [('Alice', 88., 92.) ('Charlie', 95., 88.) ('Bob', 75., 85.) ('Diana', 60., 90.)]
總分列表:
Alice: 180.0
Bob: 160.0
Charlie: 183.0
Diana: 150.0
總分（向量化）: [180. 160. 183. 150.]

📌 場景：圖像像素篩選與排序（二維數組）

# 模擬灰度圖像（0~255）
img = np.random.randint(0, 256, size=(4, 4), dtype=np.uint8)
print("原圖:\n", img)

# 篩選：只保留亮度 > 128 的像素，其餘設為0
bright_pixels = np.where(img > 128, img, 0)
print("高亮區域:\n", bright_pixels)

# 排序：獲取所有像素值的排序
flat_sorted = np.sort(img.ravel())
print("所有像素排序:", flat_sorted)

# 迭代：統計每個灰度級出現次數（實際應用中用 np.bincount 更高效）
hist = np.zeros(256, dtype=int)
for pixel in np.nditer(img):
    hist[pixel] += 1
print("直方圖（前10個）:", hist[:10])

💡 實際圖像處理中，應使用 np.histogram() 或 cv2.calcHist()，而非手動迭代。

五、性能對比：向量化 vs 顯式循環

large_arr = np.random.rand(1_000_000)

# ✅ 向量化（快）
result_vec = large_arr[large_arr > 0.5]

# ❌ 顯式循環（慢）
result_loop = []
for x in large_arr:
    if x > 0.5:
        result_loop.append(x)

在 100 萬數據上，向量化通常比 Python 循環快 10~100 倍！

六、小結：最佳實踐指南

操作	推薦方式	避免方式
迭代	`np.nditer`,`np.ndenumerate`（僅必要時）	普通 `for` 循環處理大數組
排序	`np.sort`,`argsort`,`order`（結構化）	手動實現排序算法
篩選	布爾索引、`np.where`	用循環逐個判斷

🔑 核心思想：儘可能用 NumPy 內置函數替代 Python 循環，以發揮其 C 語言底層優化的優勢。

後續

本文主要講述了numpy數組的迭代排序和篩選以及相關應用場景。python過渡項目部分代碼已經上傳至gitee，後續會逐步更新，主要受時間原因限制，當然自己也可以克隆到本地學習拓展。

資料關注

公眾號：咚咚王 gitee：https://gitee.com/wy18585051844/ai_learning

《Python編程：從入門到實踐》《利用Python進行數據分析》《算法導論中文第三版》《概率論與數理統計（第四版） (盛驟) 》《程序員的數學》《線性代數應該這樣學第3版》《微積分和數學分析引論》《（西瓜書）周志華-機器學習》《TensorFlow機器學習實戰指南》《Sklearn與TensorFlow機器學習實用指南》《模式識別（第四版）》《深度學習 deep learning》伊恩·古德費洛著花書《Python深度學習第二版(中文版)【純文本】 (登封大數據 (Francois Choliet)) (Z-Library)》《深入淺出神經網絡與深度學習+(邁克爾·尼爾森（Michael+Nielsen）》《自然語言處理綜論第2版》《Natural-Language-Processing-with-PyTorch》《計算機視覺-算法與應用(中文版)》《Learning OpenCV 4》《AIGC：智能創作時代》杜雨+&+張孜銘《AIGC原理與實踐：零基礎學大語言模型、擴散模型和多模態模型》《從零構建大語言模型（中文版）》《實戰AI大模型》《AI 3.0》

咚咚王哲博客

咚咚王哲博客

博客 / 詳情