Uploading revised phpBB forum scraping code.

2026-02-22 02:25:43 +00:00 · 2018-08-19 15:47:44 -04:00
commit 01fcfb586b
14 changed files with 324 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,46 @@
+# phpBB Forum Scraper
+Python-based scraper for phpBB forums.
+
+Code requires: 
+
+    1. Python scraping library, <a href="http://scrapy.org/" target="_blank">Scrapy</a>.
+    
+    2. Python HTML parsing library, <a href="ttps://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">BeautifulSoup</a>.
+
+
+## Scraper Output
+Scrapes the following information from forum posts: 
+
+	1. Username
+
+	2. User post count
+
+	3. Post date & time
+
+	4. Post text
+    
+    5. Quoted text
+
+
+allowed_domains = ['']
+    start_urls = ['']
+    username = ''
+    password = ''
+    form_login = False
+
+Edit `phpBB.py` and specify:
+
+    1. `allowed_domains`
+    
+    2. `start_urls`
+    
+    3. `username` & `password`
+    
+    4. `forum_login=False` or `forum_login=True`
+
+## Instructions:
+From within `/phpBB_scraper/`:
+
+`scrapy crawl phpBB` to launch the crawler.
+
+`scrapy crawl phpBB -o posts.csv` to launch the crawler and save results to CSV.