Files
phpBB-forum-scraper/README.md
David Ascienzo 76df7378f3 Updated README.md
2018-08-19 15:53:45 -04:00

40 lines
747 B
Markdown

# phpBB Forum Scraper
Python-based scraper for phpBB forums.
Code requires:
1. Python scraping library, [Scrapy](http://scrapy.org/).
2. Python HTML parsing library, [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).
## Scraper Output
Scrapes the following information from forum posts:
1. Username
2. User post count
3. Post date & time
4. Post text
5. Quoted text
Edit `phpBB.py` and specify:
1. `allowed_domains`
2. `start_urls`
3. `username` & `password`
4. `forum_login=False` or `forum_login=True`
## Instructions:
From within `/phpBB_scraper/`:
`scrapy crawl phpBB` to launch the crawler.
`scrapy crawl phpBB -o posts.csv` to launch the crawler and save results to CSV.