Files
phpBB-forum-scraper/README.md
2018-08-19 15:47:44 -04:00

929 B

phpBB Forum Scraper

Python-based scraper for phpBB forums.

Code requires:

1. Python scraping library, <a href="http://scrapy.org/" target="_blank">Scrapy</a>.

2. Python HTML parsing library, <a href="ttps://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">BeautifulSoup</a>.

Scraper Output

Scrapes the following information from forum posts:

1. Username

2. User post count

3. Post date & time

4. Post text

5. Quoted text

allowed_domains = [''] start_urls = [''] username = '' password = '' form_login = False

Edit phpBB.py and specify:

1. `allowed_domains`

2. `start_urls`

3. `username` & `password`

4. `forum_login=False` or `forum_login=True`

Instructions:

From within /phpBB_scraper/:

scrapy crawl phpBB to launch the crawler.

scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.