-
Notifications
You must be signed in to change notification settings - Fork 530
Rearranging Pages of a PDF
Since V 1.9.0, the Document class has a new method, select(s)
. The only parameter is a sequence s
of pages (given by zero-based integers), that should be left over ("selected").
As is mentioned in the manual, s
may be a Python list, tuple, array.array
, numpy.array
, or any other object that implements the sequence protocol and contains integers.
Successful execution will alter the document's representation in memory.
For example, after select([0])
, only the first page will be left over, everything else will have gone, pageCount
will be 1, and so on. If you now save the document by save(...)
, you will have a new 1-page PDF reflecting what has happened.
Interesting to note, that all links, bookmarks and annotations will be preserved, if they do not point to deleted pages.
How can this method be used?
If you know how to manipulate Python lists, you are only limited by your imagination. For example
- Delete pages containing no text or a specific text
- Only include odd / even pages, e.g. to support double sided printing on some printer hardware
- Re-arrange pages, e.g. the whole document from back to front: take
lst = list(range(doc.pageCount-1, -1, -1))
as the list to beselected
. - "Concatenate" a document with itself by specifying
lst + lst
as the list of pages to be taken -
doc.select([1,1,1,5,5,5,9,9,9])
does what it looks like: create a 9-page document of 3 times 3 equal pages - Take the first / last 10 pages:
lst = list(range(10))
,lst = list(range(doc.pageCount - 10, doc.pageCount))
, respectively. - etc.
You can apply several such selects in a row. After each one, the document structure will get updated (len(doc)
will always reflect the current page count, etc.).
The original PDF content is no longer accessible. But you can discard changes: close and re-open the document.
Save your work using doc.save(...)
. Be sure to include the garbage=4
option if you have deleted many pages and want to regain disk space.
When select()
is a caliber too big to achieve something simple: Consider using Document
methods deletePage()
, deletePageRange()
, copyPage()
or movePage()
. Under the hood, these methods themselves use select()
, but offer a more direct, intuitive appraoch when only few pages are concerned.
As a general rule, use select()
when many pages are involved and / or when some algorithm computes the required pages. Otherwise these single-page-methods may be more appropriate.
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance