Web Scraper & Data Automation for SAT Question Library (Short-Term Project, Long-Term Potential)
I'm looking for a developer or data engineer who can scrape SAT-style practice questions from publicly available sources (College Board & Khan Academy) and organize them into a clean, structured spreadsheet — with potential for long-term work building AI-powered workflows for tutors and students.
This project is the first phase of a bigger vision: helping students and tutors auto-generate homework, analyze test results, and create custom practice based on standards and difficulty levels.
Initial Scope (Week 1 Target):
Scrape question data from the following sites:
https://satsuitequestionbank.collegeboard.org/
https://www.khanacademy.org/test-prep/v2-sat-math
https://www.khanacademy.org/test-prep/sat-reading-and-writing
Data should include:
Question text
Answer choices
Correct answer
Difficulty level (if available)
Subject/category/standard (as listed on site)
Direct link to the source (where available)
Output should be a clean spreadsheet or database format, such as Google Sheets or CSV
(I’m most familiar with Google Sheets)
Future/Optional Scope (Not required for this task, but helpful context):
Tagging scraped questions to our assessment standards and scoring system
Connecting questions to external practice links (e.g., IXL/Khan Academy by standard)
Auto-generating practice sets for students based on assessment performance
Triggering scraping or homework generation via Google Sheets buttons or scripts
Using this question library as the foundation to train AI models (for custom question generation, next-step recommendations, and tutor workflows)
✅ What I'm Looking For:
Proven experience with web scraping (Python preferred, but open to other tools)
Familiarity with Google Sheets automation or Airtable integrations a plus
Clean, readable data formatting — I plan to build systems on top of this!
Ability to document the scraping script in case I want to re-run it later (e.g., next year)
Bonus: Understanding of K-12 or standardized test prep platforms
Timeline:
Ideal turnaround: 3–7 days
One-time scrape to start, but may need occasional re-runs or expansion
Budget:
Fixed
Please include a quote for:
Initial scrape and data cleanup
Optional: script/handoff for future use
To Apply:
Please include:
A short note on how you’d approach this scrape (mention any tools or languages)
Links to past scraping or automation projects (especially education-related if possible)
Whether you’d be interested in follow-up work on tutoring workflows, AI training data, or homework automation
Looking forward to building something great together.
Apply Job!
Apply to this Job