This program was created to dynamically analyze scanned PDF files using user defined templates to grab in-document information to use for renaming the files, save to project specific directories, and e-mail files to project specific distribution lists.
Note: PDF Flow is currently only compatible with Windows due to hardcoded configuration storage settings. Updating this to allow cross compatibility with Linux will be completed soon in a future update.
Before getting started, make sure you have the following dependencies installed on your system. Once installed you can either add them manually to your system PATH or specify their locations in the PDF Flow settings tab after opening. When using either option, use the Test
buttons in the settings tab to ensure they are properly detected.
- Poppler - Used for transforming PDFs to images.
- Tesseract OCR - The OCR engine for text extraction.
Note: If adding to PATH after starting PDF Flow or from an already open cmd prompt, you will have to restart them for PATH to be updated.
- Visit the PDF Flow website to download the latest .msi installer for Windows.
- Run the downloaded .msi file and follow the on-screen instructions to install the program.
- Once installed, you can launch the program from the Start Menu or desktop shortcut.
If you prefer to build from source you can follow these steps:
-
Clone the GitHub repository:
git clone https://github.com/bgorman87/PDF-Flow.git cd pdf-flow
-
Create and activate a virtual environment (recommended):
python -m venv venv venv\Scripts\activate
-
Install the required Python packages:
pip install -r requirements.txt
-
Note: On Windows if you plan to compile an executable yourself, then during file processing you may notice brief command prompt windows appearing and disappearing. If running directly from source, you may not see this occurring depending on your environment. This is due though to how the
pdf2image
package dependency runs Poppler through these prompts, which I cannot programmatically change.If you prefer not to see these windows appear/disappear, follow these general steps to prevent
pdf2image
from creating/showing the cmd prompts:- Open
pdf2image.py
in your editor of choice. This file should be located atvenv\Lib\site-packages\pdf2image\pdf2image.py
- Look for any lines that use
Popen
to execute a command. - Add the following flag to the respective
Popen
calls:creationflags=0x08000000
.
- Here's an example of what the modified line could look like:
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE, creationflags=0x08000000)
- Open
-
Run PDF Flow:
python PDF-Flow.py
For usage information/guides please see information located here: PDF Flow Usage
Will this be available for Linux
A: Currently have updating for cross compatibility on my list of things to do. Majority of this was created in Linux so not much to change. Should be available soon.For any bugs please check for relevant issues and if none apply then please open a new issue.
For general comments/questions you can reach out directly through github or through e-mail at [email protected].
Being a new/solo developer without a CS degree or any industry experience, I very much welcome any contributions to PDF Flow! I know there is a ton that can be improved or added so if you would like to contribute, please follow these steps:
-
Fork the Repository: Click the "Fork" button at the top-right of this repository to create your own copy.
-
Clone the Repository: Clone your forked repository to your local machine:
git clone https://github.com/your-username/PDF-Flow.git
cd PDF-Flow
-
Create a Branch: Create a new descriptively named branch for your changes:
git checkout -b your-descriptive-branch-name
-
Make Changes: Make your desired changes to the codebase.
-
Commit Changes: Commit your changes with a descriptive commit message:
git commit -m "Add feature XYZ"
-
Push to Your Fork: Push your changes to your fork on GitHub:
git push origin your-descriptive-branch-name
-
Submit a Pull Request: Open a pull request from your fork to this repository's
main
branch. Provide a clear and detailed description of your changes. -
Review and Collaborate: Participate in the discussion and make any necessary adjustments based on feedback.