Documentation

web.project.web_crawler module

Web Crawler designed to find products and ratings for products developed and targeting seniors.

class project.web_crawler.WebCrawler(url=None, about=None, sub_url=None, page=None, data=None, clean=False)[source]¶

Bases: object

Web Crawler

cleanup()[source]¶: Clean up csv files in the current directory, and saves them to csv folder. Returns:

self.clean: bool - file cleaned.

csv_to_database()[source]¶: Returns extracted csv data to an SQL database. Returns:

self.clean: bool - file cleaned.

data_extract()[source]¶: Extract the url page data and parses the information with BeautifulSoup

get_data()[source]¶

Get the data that the webcrawler is parsing Returns:

self.data: string - page data.

Example:

>>> example_data = crawler.get_data()
												

get_description()[source]¶

Get the description of the product, located within the targeted web page. Returns:

self.about: string - description of product.

Example:

>>> example_description = crawler.get_description()
												

get_nav_categories()[source]¶

Get the categories parsed within the webcrawler. Returns:

self.categories: list - list of categories within the navigation bar.

Example:

>>> example_categories = crawler.get_nav_categories()
												

get_nav_catlinks()[source]¶

Get the category links within the webcrawler. Returns:

self.catlinks: list - list of category links within the navigation bar.

Example:

>>> example_catlinks = crawler.get_nav_catlinks()
												

get_page()[source]¶

Gets the page that the webcrawler is parsing data from. Returns:

self.page: string - the page of the url.

Example:

>>> example_page = crawler.get_page()
												

get_sub_url()[source]¶

Gets the url that the webcrawler will be accessing. Returns:

url: string - the url.

Example:

>>> example_url = crawler.get_url()
												

get_url()[source]¶

Gets the url that the webcrawler will be accessing. Returns:

url: string - the url.

Example:

>>> example_url = crawler.get_url()
												

log_cleanup()[source]¶: Clean up log files in the current directory, and saves them to log folder. Returns:

self.clean: bool - file cleaned.

sub_data_extract()[source]¶: Extract the url page data and parses the information with BeautifulSoup