mirror of
https://github.com/NohamR/phpBB-forum-scraper.git
synced 2026-02-22 02:25:43 +00:00
747 B
747 B
phpBB Forum Scraper
Python-based scraper for phpBB forums.
Code requires:
-
Python scraping library, Scrapy.
-
Python HTML parsing library, BeautifulSoup.
Scraper Output
Scrapes the following information from forum posts:
1. Username
2. User post count
3. Post date & time
4. Post text
5. Quoted text
Edit phpBB.py and specify:
-
allowed_domains -
start_urls -
username&password -
forum_login=Falseorforum_login=True
Instructions:
From within /phpBB_scraper/:
scrapy crawl phpBB to launch the crawler.
scrapy crawl phpBB -o posts.csv to launch the crawler and save results to CSV.