# Python

# 作用域 (scope)

  • Python 程序由代码块组成,包括模块 (module), (class),函数 (def) 等
    注意 iffor 等语句不构成代码块
  • 当变量在代码块中被定义时,作用域为该代码块

# 变量名解析 LEGB 法则

变量名解析顺序:最近的 scope

__name__, __file__        ## Builtin

global_var = 1            ## Global (i.e. module-scoped)

def outer():
    enclosing_var = 2     ## Enclosing (relative to `inner`)

    def inner():
        local_var = 3     ## Local

        print(f"Local     {local_var:^8}")
        print(f"Enclosing {enclosing_var:^8}")
        print(f"Global    {global_var:^8}")
        print(f"Builtin   {__name__:^8}")

    inner()

if __name__ == "__main__":
    outer()

Output

Local        3
Enclosing    2
Global       1
Builtin   __main__

# UnboundLocalError











 
 

global_var = 1

def f1():
    print(global_var)     ## 1

def f2():
    global_var = 2        ## We now have a local variable 'global_var' within f2
    print(global_var)     ## 2

def f3():
    print(global_var)     ## UnboundLocalError: local variable 'global_var'↵
    global_var = 2        ## referenced before assignment

因为 Python 中局部变量可以和全局变量同名,所以上述错误时常出现。
如果代码块中任何位置出现了变量名绑定语句(比如 a = 2),则该代码块中所有变量名 a 都指向该局部变量,包括出现在变量名绑定之前的语句

好消息是这种错误可以被静态检查工具发现(比如 Pyright)

注意 global_dict['foo'] = 'bar' 不是变量名绑定语句,不会导致这种错误(global_dict = {'foo': 'bar'} 才是)

Python 执行模型 (opens new window)

# Python 中的路径

# 工作目录

每一个进程都有对应的工作目录 (working directory),当进程以相对路径访问文件时则是相对于此目录
对于 Python 来说工作目录即执行 python 命令时的目录,可以使用 os.getcwd() 获取。

测试目录

~/path-test/
├── subfolder/
│   └── submodule.py
└── main.py
## both main.py and submodule.py
import os

print("working dir", os.getcwd())
print("file path  ", __file__)
~/path-test$ python main.py
working dir: /home/yu/path-test
file path:   main.py

~/path-test$ python subfolder/submodule.py
working dir: /home/yu/path-test
file path:   subfolder/submodule.py
~/path-test$ cd subfolder
~/path-test/subfolder$ python submodule.py
working dir: /home/yu/path-test/subfolder
file path:   submodule.py

子进程的工作目录继承自其父进程

修改 submodule.py

import subprocess

print(subprocess.check_output("tree", shell=True, encoding="utf-8"))
~/path-test$ python subfolder/submodule.py
.
├── main.py
└── subfolder
    └── submodule.py

~/path-test/subfolder$ python submodule.py
.
└── submodule.py

# 模块查找目录 sys.path

sys.path 是 Python 尝试导入模块时会去查找的路径列表,默认初始化为 Python 的安装路径下的若干位置,以及 PYTHONPATH 环境变量。在程序中也可以访问修改

在运行 Python 时,还会在 sys.path 中额外添加特定路径,对于不同情况:

  1. python /path/to/script.py:添加 script.py 所在的目录,即 /path/to/而不是当前工作目录
  2. python -m module:添加当前工作目录,即 .
  3. python (REPL) 和 python -c code:添加空字符串,也即当前工作目录

在情况 1 下如果运行的是某个子目录下的 Python 脚本就很容易出现 ModuleNotFoundError,因为当前工作目录并不在 sys.path

https://docs.python.org/3/library/sys.html#sys.path (opens new window)

TODO

absolute and relative imports

https://stackoverflow.com/a/43859946/8682688
https://docs.python.org/3/tutorial/modules.html#the-module-search-path
https://www.pythonforthelab.com/blog/complete-guide-to-imports-in-python-absolute-relative-and-more/

# 手动断行

句末的反斜杠 \ 会把多个物理行拼接为一个逻辑行(在 """/''' 多行字符串和注释中除外)

is_valid_date = 1900 < year < 2100 and 1 <= month <= 12 \
                and 1 <= day <= 31

a_str = "this is a long\
         string"
## 'this is a long         string'

b_str = "this is a long "\
        "string"
## 'this is a long string'

Lexical analysis — Python documentation > Explicit line joining (opens new window)

# 格式化字符串 (string format)

## Basics             ## use "·" to visualize whitespace
"{} {}".format(1, 2)  ## "1·2"
f"{1} {2}"            ## "1·2"

## Each field can also specify an optional set of "format specifiers",
## which goes after the colon ":"

## The default syntax
## [[fill]align][sign][#][0][min_width][.precision][type]
##        │                                         └ e.g., b e f %
##        └ e.g., < > ^

## Padding and alignment
a = "test"
f"{a:10}"             ## "test······" (width 10)
f"{a:<10}"            ## "test······"
f"{a:>10}"            ## "······test"
f"{a:^10}"            ## "···test···"
f"{a:_<10}"           ## "test______" (fill "_")

## Floats
b = 0.5
f"{b:5}"              ## "··0.5" (`float` and `int` are right aligned by default)
f"{b:<5}"             ## "0.5··"
f"{b:05}"             ## "000.5" (zero-padding)

f"{b:.3f}"            ## "0.500"
f"{0.6666:.3f}"       ## "0.667" (rounded)
f"{b:.3e}"            ## "5.000e-01"
f"{b:.2%}"            ## "50.00%"
f"{b:6.2f}"           ## "··0.50"

## Explicit type conversion
f"{a!s}"              ## equals f"{str(a)}"
f"{a!r:10}"           ## equals f"{repr(a):10}"

f"{b:5}"              ## "··0.5"
f"{b!s:5}"            ## "0.5··"

## The `datetime` class provides its own format specification
c = datetime.now()
f"{c:%Y-%m-%d}"       ## "2022-02-17" (just like in the `strftime()` function)

## Self-documenting expressions with `=` (New in Python 3.8)
theta = 30
f"{theta=}, {cos(radians(theta))=:.3f}"  ## "theta=30, cos(radians(theta))=0.866"

Format specifiers may also contain evaluated expressions.

width     = 8
precision = 2
value     = 12.3456
f"{value:{width}.{precision}f}"
## "···12.35"
"{first} {last}".format(first="Hello", last="world!")  ## "Hello world!"

data = {"first": "Hello", "last": "world!"}
"{first} {last}".format(**data)                        ## "Hello world!"

PyFormat (intuitive examples) (opens new window)
Python strftime reference (opens new window)
PEP 3101 -- Standard Format Specifiers (opens new window)
What's New in Python 3.8 (opens new window)

# 正则表达式

正则表达式本身就不多介绍了,见 regex 101 (opens new window)

首先值得一提的就是 Python raw string,其中的反斜杠 \ 不表示转义字符,而是 literal \

"\\d" == r"\d"  # True

# 字符串替换——re.sub

大部分时候我们仅仅只是想做个(正则)字符串替换
re.sub(pattern, repl, string, count=0, flags=0)

  • pattern 要被换掉的模式
  • repl 要换成的模式,可以是字符串(支持 backreference 如 \2)或者函数
  • string 要进行替换的字符串

count 若非 0 则表示最大替换次数,flags 对应正则表达式的 flags)

re.sub(r"(.*) > (.*)", r"\2 < \1", "a > b")
# 'b < a'

# 字符串匹配和提取

re.matchre.search,都接受参数 (pattern, string, flags=0),返回 None 或者 match 对象

m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
m.group(0)     # 'Isaac Newton'       The entire match
m.group(1)     # 'Isaac'              The first parenthesized subgroup.
m.group(2)     # 'Newton'             The second parenthesized subgroup.
m.group(1, 2)  # ('Isaac', 'Newton')  Multiple arguments give us a tuple.
m.groups()     # ('Isaac', 'Newton')  All subgroups in a tuple.

re.match 要求 pattern 出现在字符串 string 的开头,re.search 则允许其出现在字符串的任意位置

如果需要多次使用某个正则表达式,可以用 re.compile() 来生成一个 pattern 对象,其同样可以使用上面这些函数 sub/match/search,甚至更精细的功能。

# 其它:字符串分割

re.split(pattern, string, maxsplit=0, flags=0)

Regular expression operations - Python documentation (opens new window)

Last updated: 4/17/2023, 12:31:33 PM