The Dormouse's story
\n", "\n", "Once upon a time there were three little sisters; and their names were\n", "Elsie,\n", "Lacie and\n", "Tillie;\n", "and they lived at the bottom of a well.
\n", "\n", "...
\n" ] } ], "source": [ "print(content.text)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T01:45:54.229935Z", "start_time": "2021-05-15T01:45:54.221219Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'utf-8'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "content.encoding" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Beautiful Soup\n", "> Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:\n", "\n", "- Beautiful Soup provides a few simple methods. It doesn't take much code to write an application\n", "- Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. Then you just have to specify the original encoding.\n", "- Beautiful Soup sits on top of popular Python parsers like `lxml` and `html5lib`.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Install beautifulsoup4\n", "\n", "open your terminal/cmd\n", "\n", "The Dormouse's story
\n", "Once upon a time there were three little sisters; and their names were\n", "Elsie,\n", "Lacie and\n", "Tillie;\n", "and they lived at the bottom of a well.
\n", "...
" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = 'http://socratesacademy.github.io/bigdata/data/test.html'\n", "content = requests.get(url)\n", "content = content.text\n", "soup = BeautifulSoup(content, 'html.parser') \n", "soup" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T01:48:59.451986Z", "start_time": "2021-05-15T01:48:59.448334Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " \n", "\n", " \n", " The Dormouse's story\n", " \n", "
\n", "\n", " Once upon a time there were three little sisters; and their names were\n", " \n", " Elsie\n", " \n", " ,\n", " \n", " Lacie\n", " \n", " and\n", " \n", " Tillie\n", " \n", " ;\n", "and they lived at the bottom of a well.\n", "
\n", "\n", " ...\n", "
\n", " \n", "\n" ] } ], "source": [ "print(soup.prettify())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- html\n", " - head\n", " - title\n", " - body\n", " - p (class = 'title', 'story' )\n", " - a (class = 'sister')\n", " - href/id" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Select 方法\n", "\n", "\n", "- 标签名不加任何修饰\n", "- 类名前加点\n", "- id名前加 #\n", "\n", "我们也可以利用这种特性,使用soup.select()方法筛选元素,返回类型是 list" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Select方法三步骤\n", "\n", "- Inspect (检查)\n", "- Copy\n", " - Copy Selector\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- 鼠标选中标题`The Dormouse's story`, 右键检查Inspect\n", "- 鼠标移动到选中的源代码\n", "- 右键Copy-->Copy Selector \n", "\n", "`body > p.title > b`\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:02:06.793459Z", "start_time": "2021-05-15T02:02:06.789378Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "\"The Dormouse's story\"" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('body > p.title > b')[0].text" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Select 方法: 通过标签名查找" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:03:52.198927Z", "start_time": "2021-05-15T02:03:52.194697Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "\"The Dormouse's story\"" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('title')[0].text" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:04:14.141466Z", "start_time": "2021-05-15T02:04:14.137294Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('a')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:04:23.787514Z", "start_time": "2021-05-15T02:04:23.783236Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('b')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Select 方法: 通过类名查找" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:04:44.844325Z", "start_time": "2021-05-15T02:04:44.840200Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('.title')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:04:52.867866Z", "start_time": "2021-05-15T02:04:52.863451Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('.sister')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:05:47.218607Z", "start_time": "2021-05-15T02:05:47.214047Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('.story')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Select 方法: 通过id名查找" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:06:00.122987Z", "start_time": "2021-05-15T02:06:00.118356Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('#link1')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:06:34.890111Z", "start_time": "2021-05-15T02:06:34.886086Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'http://example.com/elsie'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('#link1')[0]['href']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Select 方法: 组合查找\n", "\n", "将标签名、类名、id名进行组合\n", "\n", "- 例如查找 p 标签中,id 等于 link1的内容\n", " " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:07:24.429115Z", "start_time": "2021-05-15T02:07:24.425148Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select('p #link1')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Select 方法:属性查找\n", "\n", "加入属性元素\n", "- 属性需要用大于号`>`连接\n", "- 属性和标签属于同一节点,中间不能加空格。\n", " \n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:07:56.662539Z", "start_time": "2021-05-15T02:07:56.658377Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
,\n", "Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.select(\"body > p\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## find_all方法" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:08:47.621921Z", "start_time": "2021-05-15T02:08:47.617950Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
,\n", "Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#soup('p')\n", "soup.find_all('p')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2020-06-06T02:15:21.397409Z", "start_time": "2020-06-06T02:15:21.369088Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
,\n", "Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('p') " ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:09:09.820472Z", "start_time": "2021-05-15T02:09:09.816375Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[\"The Dormouse's story\",\n", " 'Once upon a time there were three little sisters; and their names were\\nElsie,\\nLacie and\\nTillie;\\nand they lived at the bottom of a well.',\n", " '...']" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[i.text for i in soup('p')]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:09:36.730551Z", "start_time": "2021-05-15T02:09:36.727221Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Dormouse's story\n", "Once upon a time there were three little sisters; and their names were\n", "Elsie,\n", "Lacie and\n", "Tillie;\n", "and they lived at the bottom of a well.\n", "...\n" ] } ], "source": [ "for i in soup('p'):\n", " print(i.text)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:09:51.874753Z", "start_time": "2021-05-15T02:09:51.870515Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "html\n", "head\n", "title\n", "body\n", "p\n", "b\n", "p\n", "a\n", "a\n", "a\n", "p\n" ] } ], "source": [ "for tag in soup.find_all(True):\n", " print(tag.name)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:10:01.561490Z", "start_time": "2021-05-15T02:10:01.557739Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
\n", "Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
\n", "...
]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup('body') # or soup.body" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:10:13.554405Z", "start_time": "2021-05-15T02:10:13.550566Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story
,\n", "Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup('p')" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "The Dormouse's story
" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.p" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:10:46.279073Z", "start_time": "2021-05-15T02:10:46.275793Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'title'" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.title.name" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:10:53.369269Z", "start_time": "2021-05-15T02:10:53.365521Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "\"The Dormouse's story\"" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.title.string" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:10:59.969036Z", "start_time": "2021-05-15T02:10:59.965370Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "\"The Dormouse's story\"" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.title.text\n", "# 推荐使用text方法" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2020-06-06T02:18:16.349550Z", "start_time": "2020-06-06T02:18:16.340669Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'head'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.title.parent.name" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:11:21.857429Z", "start_time": "2021-05-15T02:11:21.853450Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "The Dormouse's story
" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.p" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:11:32.544277Z", "start_time": "2021-05-15T02:11:32.540376Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['title']" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.p['class']" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:13:12.189910Z", "start_time": "2021-05-15T02:13:12.185675Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Once upon a time there were three little sisters; and their names were\n", " Elsie,\n", " Lacie and\n", " Tillie;\n", " and they lived at the bottom of a well.
,\n", "...
]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('p', {'class', 'story'}) " ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:13:24.314037Z", "start_time": "2021-05-15T02:13:24.311706Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "#soup.find_all('p', class_= 'title')" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:13:32.002962Z", "start_time": "2021-05-15T02:13:31.998838Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'})" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T02:08:27.252239Z", "start_time": "2018-04-28T02:08:27.247016Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('p', {'class', 'story'})[0].find_all('a')" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:06.047586Z", "start_time": "2021-05-15T02:14:06.043761Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Elsie" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.a" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:10.817296Z", "start_time": "2021-05-15T02:14:10.813436Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup('a')" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:38.104275Z", "start_time": "2021-05-15T02:14:38.100394Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Elsie" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find(id=\"link1\")" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:41.664424Z", "start_time": "2021-05-15T02:14:41.660615Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a')" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:45.672192Z", "start_time": "2021-05-15T02:14:45.667941Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'}) # compare with soup.find_all('a')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "ExecuteTime": { "end_time": "2021-05-15T02:14:48.888543Z", "start_time": "2021-05-15T02:14:48.884296Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Elsie" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'})[0]" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Elsie'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'})[0].text " ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'http://example.com/elsie'" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'})[0]['href']" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'link1'" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all('a', {'class', 'sister'})[0]['id']" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "ExecuteTime": { "end_time": "2020-11-03T03:42:28.907584Z", "start_time": "2020-11-03T03:42:28.903704Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[The Dormouse's story,\n", " Elsie,\n", " Lacie,\n", " Tillie]" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soup.find_all([\"a\", \"b\"])" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "ExecuteTime": { "end_time": "2020-11-03T03:43:23.483217Z", "start_time": "2020-11-03T03:43:23.480006Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Dormouse's story\n", "\n", "The Dormouse's story\n", "Once upon a time there were three little sisters; and their names were\n", "Elsie,\n", "Lacie and\n", "Tillie;\n", "and they lived at the bottom of a well.\n", "...\n" ] } ], "source": [ "print(soup.get_text())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![image.png](./images/end.png)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "100px", "left": "1287.36px", "top": "0px", "width": "130.656px" }, "toc_section_display": false, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }