Files
phpBB-forum-scraper/README.md
David Ascienzo bf2fe60993 Updated README.md
2018-08-19 15:51:11 -04:00

771 B

phpBB Forum Scraper

Python-based scraper for phpBB forums.

Code requires:

1. Python scraping library, [Scrapy.]: http://scrapy.org/

2. Python HTML parsing library, [BeautifulSoup.]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Scraper Output

Scrapes the following information from forum posts:

1. Username

2. User post count

3. Post date & time

4. Post text

5. Quoted text

Edit phpBB.py and specify:

1. `allowed_domains`

2. `start_urls`

3. `username` & `password`

4. `forum_login=False` or `forum_login=True`

Instructions:

From within /phpBB_scraper/:

scrapy crawl phpBB to launch the crawler.

scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.