mirror of
https://github.com/NohamR/phpBB-forum-scraper.git
synced 2026-02-22 02:25:43 +00:00
771 B
771 B
phpBB Forum Scraper
Python-based scraper for phpBB forums.
Code requires:
1. Python scraping library, [Scrapy.]: http://scrapy.org/
2. Python HTML parsing library, [BeautifulSoup.]: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Scraper Output
Scrapes the following information from forum posts:
1. Username
2. User post count
3. Post date & time
4. Post text
5. Quoted text
Edit phpBB.py and specify:
1. `allowed_domains`
2. `start_urls`
3. `username` & `password`
4. `forum_login=False` or `forum_login=True`
Instructions:
From within /phpBB_scraper/:
scrapy crawl phpBB to launch the crawler.
scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.