Python-based scraper for phpBB forums.
Code requires:
1. Python scraping library, <a href="http://scrapy.org/" target="_blank">Scrapy</a>. 2. Python HTML parsing library, <a href="ttps://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">BeautifulSoup</a>.
Scrapes the following information from forum posts:
1. Username 2. User post count 3. Post date & time 4. Post text 5. Quoted text
allowed_domains = [''] start_urls = [''] username = '' password = '' form_login = False
Edit phpBB.py and specify:
phpBB.py
1. `allowed_domains` 2. `start_urls` 3. `username` & `password` 4. `forum_login=False` or `forum_login=True`
From within /phpBB_scraper/:
/phpBB_scraper/
scrapy crawl phpBB to launch the crawler.
scrapy crawl phpBB
scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.
scrapy crawl phpBB -o posts.csv