Python 正则表达式

RegEx（或正则表达式）是形成搜索模式的字符序列。

RegEx 可用于检查字符串是否包含指定的搜索模式。

正则表达式模块

Python有一个内置的包叫做re，可用于使用正则表达式。

导入re模块：

import re

当您导入了re模块中，您可以开始使用正则表达式：

搜索字符串以查看其是否以 "The" 开头并以 "Spain" 结尾：

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

这个re模块提供了一组函数，允许我们在字符串中搜索匹配项：

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

元字符是具有特殊含义的字符：

Character	Description	Example	尝试一下
[]	A set of characters	"[a-m]"	尝试一下 »
\	Signals a special sequence (can also be used to escape special characters)	"\d"	尝试一下 »
.	Any character (except newline character)	"he..o"	尝试一下 »
^	Starts with	"^hello"	尝试一下 »
$	Ends with	"planet$"	尝试一下 »
*	Zero or more occurrences	"he.*o"	尝试一下 »
+	One or more occurrences	"he.+o"	尝试一下 »
?	Zero or one occurrences	"he.?o"	尝试一下 »
{}	Exactly the specified number of occurrences	"he.{2}o"	尝试一下 »
\|	Either or	"falls\|stays"	尝试一下 »
()	Capture and group

一个特殊的序列是\后跟下面列表中的字符之一，并且具有特殊含义：

Character	Description	Example	尝试一下
\A	Returns a match if the specified characters are at the beginning of the string	"\AThe"	尝试一下 »
\b	Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain" r"ain\b"	尝试一下 » 尝试一下 »
\B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\Bain" r"ain\B"	尝试一下 » 尝试一下 »
\d	Returns a match where the string contains digits (numbers from 0-9)	"\d"	尝试一下 »
\D	Returns a match where the string DOES NOT contain digits	"\D"	尝试一下 »
\s	Returns a match where the string contains a white space character	"\s"	尝试一下 »
\S	Returns a match where the string DOES NOT contain a white space character	"\S"	尝试一下 »
\w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"	尝试一下 »
\W	Returns a match where the string DOES NOT contain any word characters	"\W"	尝试一下 »
\Z	Returns a match if the specified characters are at the end of the string	"Spain\Z"	尝试一下 »

集合是一对方括号内的一组字符 []具有特殊意义：

Set	Description	尝试一下
[arn]	Returns a match where one of the specified characters (`a`, `r`, or `n`) is present	尝试一下 »
[a-n]	Returns a match for any lower case character, alphabetically between `a` and `n`	尝试一下 »
[^arn]	Returns a match for any character EXCEPT `a`, `r`, and `n`	尝试一下 »
[0123]	Returns a match where any of the specified digits (`0`, `1`, `2`, or `3`) are present	尝试一下 »
[0-9]	Returns a match for any digit between `0` and `9`	尝试一下 »
[0-5][0-9]	Returns a match for any two-digit numbers from `00` and `59`	尝试一下 »
[a-zA-Z]	Returns a match for any character alphabetically between `a` and `z`, lower case OR upper case	尝试一下 »
[+]	In sets, `+`, `*`, `.`, `\|`, `()`, `$`,`{}` has no special meaning, so `[+]` means: return a match for any `+` character in the string	尝试一下 »