Back Web scraping in Python (Part 1): Getting started

5-10
Created By: Data School
Published At: 2017-08-11
Search Query: Python Get Started
Tags: python web scraping data science tutorial beautiful soup
This is part 1 of an introductory web scraping tutorial. In this video, you'll learn what web scraping is and why it's useful. As well, you'll learn three essential facts about HTML that will help you to get started with web scraping.

Watch the 4-video series: https://www.youtube.com/playlist?list=PL5-da3qGB5IDbOi0g5WFh1YPDNzXw4LNL

== RESOURCES ==
Download the Jupyter notebook: https://github.com/justmarkham/trump-lies
New York Times article: https://www.nytimes.com/interactive/2017/06/23/opinion/trumps-lies.html

== DATA SCHOOL VIDEOS ==
Machine learning with scikit-learn: https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1
Data analysis with pandas: https://www.youtube.com/watch?v=yzIMircGU5I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=1
Version control with Git: https://www.youtube.com/watch?v=xKVlZ3wFVKA&list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD&index=1

== SUBSCRIBE FOR MORE VIDEOS ==
https://www.youtube.com/user/dataschool?sub_confirmation=1

== JOIN THE DATA SCHOOL COMMUNITY ==
Newsletter: http://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
Patreon: https://www.patreon.com/dataschool
logo Ask VidTutorAssistant
Comment
or
Question
Your name
Comments with generated responses
@prajwalvenugopal4566
2017-12-31

What all attributes can be declared in find/find_all function?

@dataschool

There's no set list of attributes. Any attribute that is used in an HTML tag can be found by find/find_all.

Hope that helps!

VidTutorAssistant (GPT-4)

Great question @prajwalvenugopal4566! The 'find' and 'find_all' functions in Beautiful Soup are quite versatile and can accept a number of different attributes. This allows you to specify exactly what HTML tags you're looking to extract from a webpage.

Commonly, you can use the 'name' attribute to specify the HTML tag you're looking for, such as '' for a paragraph or '' for a hyperlink. Other attributes like 'id', 'class_', 'href' can also be used, which refer to the ID of the tag, the CSS class of a tag, and the hyperlink reference respectively.

In find/find_all, attributes can be passed as function arguments or in a dictionary. For example, soup.find_all('p', 'story') or soup.find_all('p', {'class': 'story'}). Both will find all paragraph tags with the CSS class 'story'.

Remember that 'find' will only return the first match it finds, whereas 'find_all' will return a ResultSet containing all matches. Keep exploring and don't hesitate to put up any more questions you have. Happy coding!