python re.match()用法相关示例

(编辑：jimmy 日期: 2026/7/16 浏览：3 次 )

学习python爬虫时遇到了一个问题，书上有示例如下：

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*)are(.*"htmlcode">

matchObj=re.match(r'(.*)are(.*"htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*)are(.*"matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')




得到的结果是：

matchObj.group(): Cats are smarter than dogs

matchObj.group(1): Cats 

matchObj.group(2): 

matchObj.group(3):  smarter than dogs



可见第二个括号里的内容被默认为空了，然后删去那个？，可以看到结果变成：

matchObj.group(): Cats are smarter than dogs

matchObj.group(1): Cats 

matchObj.group(2):  smarter than dogs

matchObj.group(3): 



那么这是否就意味着？的默认值很可能是0次，那？这个符号到底有什么用呢
仔细想来这个说法并不是很严谨。尝试使用单独的."htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are(.*)"matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))




也能在组别2中正常提取到are之后的字符内容，但稍微改动一下将？放到第二个括号内，
就什么也提取不到，同时导致group(0)中匹配的字符到Cats are就截止了（也就是第二个括号匹配失败）。
令人感到奇怪的是，如果将上面的代码改成


import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are (.*)+',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))




也就是仅仅将？改为+，虽然能成功匹配整个line但group(2)中没有内容，
如果把+放到第二个括号中就会产生报错，匹配失败。
那么是否可以认为.*"htmlcode">

import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are (.*r).*',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 #print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')




为了泛用性尝试了一下把r改成‘ '但是得到的结果是‘smarter than '。于是尝试把.换成表示任意字母的
[a-zA-Z]，成功提取出了单个smarter，代码如下：


import re

line='Cats are smarter than dogs'
matchObj=re.match(r'(.*) are ([a-zA-Z]* ).*',line)

if matchObj:
 print("matchObj.group():",matchObj.group())
 print("matchObj.group(1):", matchObj.group(1))
 print("matchObj.group(2):", matchObj.group(2))
 #print("matchObj.group(3):", matchObj.group(3))
else:
 print('No match!\n')

上一篇：Python爬虫实现selenium处理iframe作用域问题
下一篇：python利用appium实现手机APP自动化的示例

一句话新闻

高通与谷歌联手！首款骁龙PC优化Chrome浏览器发布

高通和谷歌日前宣布，推出首次面向搭载骁龙的Windows PC的优化版Chrome浏览器。
在对骁龙X Elite参考设计的初步测试中，全新的Chrome浏览器在Speedometer 2.1基准测试中实现了显著的性能提升。
预计在2024年年中之前，搭载骁龙X Elite计算平台的PC将面世。该浏览器的提前问世，有助于骁龙PC问世就获得满血表现。
谷歌高级副总裁Hiroshi Lockheimer表示，此次与高通的合作将有助于确保Chrome用户在当前ARM兼容的PC上获得最佳的浏览体验。

友情链接:杰晶网络 DDR爱好者之家南强小屋黑松山资源网白云城资源网网站地图 SiteMap