How to use beautifulsoup to scrape emails or phone numbers from a website

Scraping data with BeautifulSoup can be a fun experience if you know exactly what you are doing and frustrating, I mean very frustrating if you don't know what you are typing at a very stroke of the keyboard.

This guide assumes you have located a website that contains emails and you would really loove to scrape.

With that said, let's dive in;

First of all before using BeautifulSoup you have to;

pip install beautifulsoup4
or 
pip install bs4

Install requests to pull the url into your IDE and lxml to crawl

pip install lxml
pip install requests

Once you are done, import Beautifulsoup and requests to start working with them.

# import libraries
from bs4 import BeautifulSoup
import requests

If you are looking to scrape multiple urls you can setup your urls like this;

urls = ['https://www.url1.com', 'www.url2.com', 'etc']

And use a for loop to iterate through the urls list like below;

for url in urls:
    source = requests.get(url).text
    soup = BeautifulSoup(source, 'lxml')

If it's just a single url then the below code should suffice, without indenting.

source = requests.get('www.url.com').text

Next create your soup object to parse beautifulsoup into your url(s).

soup = BeautifulSoup(source, 'lxml')

Now create a variable to search the html tags the email is located. e.g. a 'div' containing a 'class' with your beautifulsoup object. In other words point beautifulsoup in the direction you want it to 'hunt down the kill'.

email_location = soup.findAll('div', class_ = 'relative w-full inline-flex items-center flex-row')

At this point if the emails are on one page you will have to create a for loop to loop through the target area or create a link to loop through several pages with similar target areas. At this point you should be searching for a 'href', 'a' or if encrypted a 'data-cfemail'. (Check out my post on scraping encrypted emails for your digest if needed). Happy scraping!

If you need assistance with your projects feel free to email me at info@airgad.com or whatsapp Jesse stay safe!