Skip to content

Scrapes records from FOIA document review PDFs to CSVs with extracted information

License

Notifications You must be signed in to change notification settings

kiranw/FOIA_doc_review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FOIA Doc Review Scraper

Scrapes records from FOIA document review PDFs to CSVs with extracted information


🎯 TODO • Improvements

  • Group pages ranges together into a single row
  • Clean up exemption parsing (when parenthesis are off)
  • Text summarization?
  • Infer document title

📦 Dependencies

  • Python3
  • ImageMagick
  • Pillow
  • Pytesseract

✨ To run

  • Navigate to this folder in the terminal
python3 -m http.server

Then, go to http://localhost:8000/ in your browser (your server logs should tell you where this is running).

About

Scrapes records from FOIA document review PDFs to CSVs with extracted information

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published