Python-based scraper for phpBB forums.
Code requires:
Python scraping library, Scrapy.
Python HTML parsing library, BeautifulSoup.
Scrapes the following information from forum posts:
1. Username 2. User post count 3. Post date & time 4. Post text 5. Quoted text
Edit phpBB.py and specify:
phpBB.py
allowed_domains
start_urls
username & password
username
password
forum_login=False or forum_login=True
forum_login=False
forum_login=True
From within /phpBB_scraper/:
/phpBB_scraper/
scrapy crawl phpBB to launch the crawler.
scrapy crawl phpBB
scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.
scrapy crawl phpBB -o posts.csv
NOTE: Please adjust settings.py to throttle your requests.
settings.py