Webscraping in Python

 


Web scraping in Python is a way to automatically extract data from websites. It’s useful when you want to gather information from a web page, like a list of products, prices, or articles, without manually copying and pasting.


How It Works


1. Requesting the Web Page: First, you send a request to the website's server to access the page you want to scrape. You can use a library like requests in Python to do this.



2. Parsing the Content: Once you have the page’s HTML, you need to find and extract the specific information you want. This is where BeautifulSoup or lxml libraries can help you search for specific tags or attributes.



3. Extracting Data: After locating the required information, you extract and save it in your desired format, like a CSV or a database.




Real-Time Example


Let’s say you want to check the latest news headlines from a website:


1. Send a Request: You send a request to the news website to load its HTML.



2. Find the Headlines: Use BeautifulSoup to locate all the headlines, which are usually in tags like <h2> or <h3> under a specific class or ID.



3. Save the Headlines: Once found, extract the text of each headline and save it, maybe in a list or a CSV file.




Example Code


Here’s a small Python script to get headlines from a hypothetical news site:


import requests

from bs4 import BeautifulSoup


# Step 1: Send a request to the website

url = "https://example-news-site.com"

response = requests.get(url)


# Step 2: Parse the HTML content

soup = BeautifulSoup(response.content, 'html.parser')


# Step 3: Extract headlines

headlines = []

for headline in soup.find_all("h2", class_="headline"):

    headlines.append(headline.get_text())


# Step 4: Print or save the headlines

for i, headline in enumerate(headlines, 1):

    print(f"{i}. {headline}")


In this code:


We request the HTML of the page.


Then, using BeautifulSoup, we find all headlines under <h2> tags with a specific class.


Finally, we print the extracted headlines.


Python and Shell scripting Book: https://payhip.com/b/247HD



Comments

Popular posts from this blog

Python & Shell Scripting Real Time Course Book & Videos

Top Five Devops Technical Interview QA Books

Linux-Command Hands-On (DF)