#Python BeautifulSoup 教程
decode_contents 获取的是 unicode 字符串。
注意,文本内容中不会有 HTML 标签。即使在嵌套的情况下,也不会有。
代码:
from bs4 import BeautifulSoup html_content = ''' <div id="content" data="你好">测试01</div> <div>测试03</div> ''' soup = BeautifulSoup(html_content, 'html.parser') content_div = soup.select_one("#content") print(content_div.decode_contents())
执行结果:
测试01
from bs4 import BeautifulSoup html_content = ''' <div id="content" data="你好"> <p>测试01</p> <span>测试02</span> </div> <div>测试03</div> ''' soup = BeautifulSoup(html_content, 'html.parser') content_div = soup.select_one("#content") print('text:', content_div.decode_contents())
<p>测试01</p> <span>测试02</span>
( 本文完 )