web.project package¶

Submodules¶

web.project.csv_database module¶

web.project.prod_extract module¶

web.project.web_crawler module¶

Web Crawler designed to find products and ratings for products developed and targeting seniors.

class project.web_crawler.WebCrawler(url=None, about=None, sub_url=None, page=None, data=None, clean=False)[source]¶

Bases: object

Web Crawler

cleanup()[source]¶: Clean up csv files in the current directory, and saves them to csv folder. Returns:

self.clean: bool - file cleaned.

compress()[source]¶: Compresses files received from web-crawler.

csv_to_database()[source]¶: Returns extracted csv data to an SQL database. Returns:

self.clean: bool - file cleaned.

data_extract()[source]¶: Extract the url page data and parses the information with BeautifulSoup

get_data()[source]¶

Get the data that the webcrawler is parsing Returns:

self.data: string - page data.

Example:

>>> example_data = crawler.get_data()

get_description()[source]¶

Get the description of the product, located within the targeted web page. Returns:

self.about: string - description of product.

Example:

>>> example_description = crawler.get_description()

get_nav_categories()[source]¶

Get the categories parsed within the webcrawler. Returns:

self.categories: list - list of categories within the navigation bar.

Example:

>>> example_categories = crawler.get_nav_categories()

get_nav_catlinks()[source]¶

Get the category links within the webcrawler. Returns:

self.catlinks: list - list of category links within the navigation bar.

Example:

>>> example_catlinks = crawler.get_nav_catlinks()

get_page()[source]¶

Gets the page that the webcrawler is parsing data from. Returns:

self.page: string - the page of the url.

Example:

>>> example_page = crawler.get_page()

get_sub_url()[source]¶

Gets the url that the webcrawler will be accessing. Returns:

url: string - the url.

Example:

>>> example_url = crawler.get_url()

get_url()[source]¶

Gets the url that the webcrawler will be accessing. Returns:

url: string - the url.

Example:

>>> example_url = crawler.get_url()

log_cleanup()[source]¶: Clean up log files in the current directory, and saves them to log folder. Returns:

self.clean: bool - file cleaned.

open_log()[source]¶: Creates log file for web-crawler.

sub_data_extract()[source]¶: Extract the url page data and parses the information with BeautifulSoup

web.project package¶

Submodules¶

web.project.csv_database module¶

web.project.prod_extract module¶

web.project.web_crawler module¶

Module contents¶