Effective Python 01

Item 1: 知道你現在使用的 Python 版本

目前主要的兩種版本 Python 2 和 Python 3

$ python --version
Python 2.7.11

$ python -V
Python 2.7.11

$ python -c "import sys; print sys.version"

$ bpython
bpython version 0.15 on top of Python 2.7.11 /usr/local/opt/python/bin/python2.7
>>> import sys
>>> print sys.version_info
sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0)
>>> print sys.version
2.7.11 (default, Dec  5 2015, 14:44:53)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)]
>>>

Item 2: 遵行 PEP 8 style guide

PEP 0008 -- Style Guide for Python Code

使用 Pylint 工具檢查 code 是否遵循 PEP 8

Item 3: 知道 bytes, str, unicode 之間的差異

指定編碼, 避免中文在 code 噴錯

# -*- coding: utf-8 -*-

指定編輯器的編碼, 有時候輸出為 ascii, 則會噴錯

import sys

reload(sys)
sys.setdefaultencoding('utf-8')

Python 2 的編碼主要有兩種, str(8-bit values) 與 unicode(Unicode char)

def to_unicode(unicode_or_str):
    if isinstance(unicode_or_str, str):
        value = unicode_or_str.decode('utf-8')
    else:
        value = unicode_or_str
    return value


def to_str(unicode_or_str):
    if isinstance(unicode_or_str, unicode):
        value = unicode_or_str.encode('utf-8')
    else:
        value = unicode_or_str
    return value

Python 3 的編碼主要有兩種, bytes(8-bit values) 與 str(Unicode char)

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

注意不同型態的編碼, 在進行操作之前
- Python 3, bytes 和 str 無法直接操作
- Python 2, str 和 unicode 可直接操作, 當 str 只含 7-bit ascii char
使用 helper function, 確認字串在操作之前擁有相同型態
如果在檔案中, 要讀取 binary data, 記得告知 file 使用的 mode ('rb' 或 'wb')
- Python 3, 檔案處理預設為 utf-8, Python 2 是 binary encoding, 故當 Python 3 在讀寫 binary data 時, 要記得設置 mode

Links

Item 4: 使用 Helper function 而不是複雜的表示式

舉 url parse 為例

import urlparse

my_values = urlparse.parse_qs('red=5&bloue=0&green=', keep_blank_values=True)
print my_values

{'bloue': ['0'], 'green': [''], 'red': ['5']}

parse_qs() 可解析變數與值, 但是在解析後, 有些值為空, 有些變數則是不存在, .get() 方法會回傳不同的結果

print 'Red:     {red}'.format(red=my_values.get('red'))
print 'Green:   {green}'.format(green=my_values.get('green'))
print 'Opacity: {opacity}'.format(opacity=my_values.get('opacity'))

Red:     ['5']
Green:   ['']
Opacity: None

若變數不存在或值為空, 預設傳回相同值 0, 程式會比較好維護, 利用 or 的特性, 當變數不存在則回傳 0, 改寫如下:

red = my_values.get('red', [''])[0] or 0
green = my_values.get('green', [''])[0] or 0
opacity = my_values.get('opacity', [''])[0] or 0

print 'Red:     {red}'.format(red=red)
print 'Green:   {green}'.format(green=green)
print 'Opacity: {opacity}'.format(opacity=opacity)

Red:     5
Green:   0
Opacity: 0

另外要注意 parse 回來的值, 型態為字串, 若要處理整數變數時, 要進行轉換

red = int(my_values.get('red', [''])[0] or 0)
print 'Red:     {red}'.format(red=red)

Red:     5

但上述表示不易閱讀, 簡化如下

red = my_values.get('red', [''])
red = int(red[0]) if red[0] else 0

print type(red), red

<type 'int'> 5

if/else 的結構仍不好閱讀, 改寫如下

red = my_values.get('red', [''])
if red[0]:
    red = int(red[0])
else:
    red = 0

print type(red), red

<type 'int'> 5

由於不只一個變數要處理, 撰寫 helper function 來處理

def get_first_int(values, key, default=0):
    found = values.get(key, [''])
    if found[0]:
        found = int(found[0])
    else:
        found = default
    return found

red = get_first_int(my_values, 'red')
print type(red), red

<type 'int'> 5

Item 5: 字串切片

a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

assert a[:5] == a[0:5]
assert a[5:] == a[5:len(a)]

若 list 從頭取值, 省略 0 可避免視覺干擾
同樣, 若 list 要取尾端的值, 省略 len(a) 可避免視覺干擾

另外, 要特別注意這樣的切片 list[-n:], 在 b 中只有一個數, 但 b[-3:] 與 b[-1:] 會得到相同結果

b = ['a']
print b[-1:]  # ['a']
print b[-3:]  # ['a']

slicing a list 會得到新的 list, 當修改其中元素的值時, 並不會影響原來的 list

c = a[4:]
c[1] = 99
print c  # ['e', 99, 'g', 'h']

當不做切片, 而是直接參照 list 到另一個 list 時, 修改任一 list 會影響到另一 list 的值

e = a
print "Before a =", a
a[:] = [1, 2, 3]
assert a is e
print "After  a =", a
print "       e =", e

Before a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
After  a = [1, 2, 3]
       e = [1, 2, 3]

Item 6: 避免在單一切片中使用 start, end, stride

在 list, 有以下的語法 alist[start:end:stride] 進行切片

常見的 trick, 反轉字串

x = 'python'
print x[::-1]

nohtyp

在 byte strings 或 ascii chars, 可正常工作, 但遇到 unicode 時則會噴錯

w = '謝謝你'
x = w.encode('utf-8')
print w
print x
y = x[::-1]
print y
z = y.decode('utf-8')
print z

謝謝你
謝謝你
��䝬蝬�
Traceback (most recent call last):
  File "item6.py", line 18, in <module>
    z = y.decode('utf-8')
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 0: invalid start byte

當跳躍 (stride) 為負數時, 不易閱讀, 盡量避免該狀況

a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

print a[::2]       # ['a', 'c', 'e', 'g']
print a[::-2]      # ['h', 'f', 'd', 'b']
print a[2::2]      # ['c', 'e', 'g']
print a[-2::-2]    # ['g', 'e', 'c', 'a']
print a[-2:2:-2]   # ['g', 'e']
print a[2:2:-2]    # []

若需要做跳躍切片, 比較建議的方式是, 跳躍為正數, 先做切片後, 再利用 index 來取值; 為避免兩步驟切片, 產生額外的資料複製, 先做跳躍再切片降低資料大小. 若對記憶體有顧慮, 可使用 itertools 的 islice 函數, Item 46 會說明具體方法.

b = a[::2]
c = b[1:-1]

print b  # ['a', 'c', 'e', 'g']
print c  # ['c', 'e']

Item 7: 使用 List Comprehension 而不是 map 及 filter

List Comprehension 計算平方和

a = range(1, 11)
squares = [i ** 2 for i in a]
print squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

除非只有一個參數, 否則使用 map, 由於要撰寫 lambda function, 在視覺上較多干擾, List Comprehension 比較簡潔易讀

squares = map(lambda x: x ** 2, a)
print squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

取出偶數的平方和數字

even_squares = [i ** 2 for i in a if (i ** 2) % 2 == 0]
print even_squares

[4, 16, 36, 64, 100]

雖然 map 與 filter 也能達到相同效果, 但易讀性較差

even_squares = filter(lambda x: x % 2 == 0, map(lambda x: x ** 2, a))
print even_squares  # [4, 16, 36, 64, 100]

set 與 dictionary 亦有 List Comprehension, 在取出資料或導出資料時, 操作會更加容易.

Item 8: 避免在 List Comprehension 裡包含兩個表示式

易讀, 表示式的讀法是由左到右

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print flat  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

另一種可讀的寫法, 將陣列裡的元素每個都取平方

squared = [[x ** 2for x in row] for row in matrix]
print squared  # [[1, 4, 9], [16, 25, 36], [49, 64, 81]]

若超過兩個表示式, 顯得冗長不易閱讀, 分行撰寫會比較清楚, 另外改成正常的 for-loop 可讀性較高

my_lists = [
    [[1, 2, 3], [4, 5, 6]],
    [[7, 8, 9], [10, 11, 12]]
]

flat = [x for sublist1 in my_lists
        for sublist2 in sublist1
        for x in sublist2]
print flat  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

flat = []
for sublist1 in my_lists:
    for sublist2 in sublist1:
        flat.extend(sublist2)
print flat  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

列表綜合支援多個條件控制

a = range(1, 11)
b = [x for x in a if x > 4 if x % 2 == 0]
print b  # [6, 8, 10]
c = [x for x in a if x > 4 and x % 2 == 0]
print c  # [6, 8, 10]

在表示式的每一層, 亦可撰寫條件控制

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
filtered = [[x for x in row if x % 3 == 0]
            for row in matrix if sum(row) >= 10]
print filtered  # [[6], [9]]

Item 9: 遇到大型的列表綜合考慮用 generator 來表示

列表綜合會產生新 list, 若是大的 list, 可能會有記憶體不足的問題

下列例子會讀取檔案中的每一列, 並回傳長度, 如果檔案過長無法停止, 可能記憶體就會爆掉

value = [len(x) for x in open('/tmp/my_file.txt')]
print value

為了解決這類問題, Python 中支援生成器 (generator)

it = (len(x) for x in open('/tmp/my_file.txt'))
print it  # <generator object <genexpr> at 0x10cf8df00>
print next(it)

要取出生成器中的值, 可利用 next 一次存取一個,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly