Linux pdf delete one page

Содержание

Remove the last page of a pdf file using PDFtk?
Shell remove pages from a pdf document ubuntu
Is it possible to delete some pages of a pdf document?
Remove pages with redundant content from PDF document
Remove the last page of a pdf file using PDFtk?
Remove only 1st page from a LOT of pdf files
How to replace a single page in a pdf using another pdf in linux?

Remove the last page of a pdf file using PDFtk?

You can reference page numbers in reverse order by prefixing them with the letter r. For example, page r1 is the last page of the document, r2 is the next-to-last page of the document, and rend is the first page of the document. You can use this prefix in ranges, too, for example r3-r1 is the last three pages of a PDF.

If you want to remove more than one page, you can change the range, for example 1-r3 does all but the last two pages.

You need to find out the page count, then use this with the pdftk cat function, since (AFAICT) pdftk does not allow one to specify an «offset from last».

A tool like ‘pdfinfo’ from Poppler (http://poppler.freedesktop.org/) can provide this.

Wrapping this in a bit of bash scripting can easily automate this process:

page_count=`pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) pdftk A="$INFILE" cat A1-$page_count output "$OUTFILE"

Obviously adding parameters, error checking, and what-not also could be placed in said script:

#! /bin/sh ### Path to the PDF Toolkit executable 'pdftk' pdftk='/usr/bin/pdftk' pdfinfo='/usr/bin/pdfinfo' #################################################################### script=`basename "$0"` ### Script help if [ "$1" = "" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ] || [ "$1" = "-?" ] || [ "$1" = "/?" ]; then echo "$script: []" echo " Removes the last page from the PDF, overwriting the source" echo " if no output filename is given" exit 1 fi ### Check we have pdftk available if [ ! -x "$pdftk" ] || [ ! -x "$pdfinfo" ]; then echo "$script: The PDF Toolkit and/or Poppler doesn't seem to be installed" echo " (was looking for the [$pdftk] and [$pdfinfo] executables)" exit 2 fi ### Check our input is OK INFILE="$1" if [ ! -r "$INFILE" ]; then echo "$script: Failed to read [$INFILE]" exit 2 fi OUTFILE="$2" if [ "$OUTFILE" = "" ]; then echo "$script: Will overwrite [$INFILE] if processing is ok" fi timestamp=`date +"%Y%m%d-%H%M%S"` tmpfile="/tmp/$script.$timestamp" page_count=`$pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) ### Do the deed! $pdftk A="$INFILE" cat A1-$page_count output "$tmpfile" ### Was it good for you? if [ $? -eq 0 ]; then echo "$script: PDF Toolkit says all is good" if [ "$OUTFILE" = "" ]; then echo "$script: Overwriting [$INFILE]" cp -f "$tmpfile" "$INFILE" else echo "$script: Creating [$OUTFILE]" cp -f "$tmpfile" "$OUTFILE" fi fi ### Clean Up if [ -f "$tmpfile" ]; then rm -f "$tmpfile" fi

Источник

Shell remove pages from a pdf document ubuntu

It allows PDFs to be extracted to single pages and then reattached back again. Just extract the EXE and DLL to the same directory as your PDF, then from the command prompt use a command similar to the following: This will delete page 13 from the PDF.

Is it possible to delete some pages of a pdf document?

You can use command line tools pdftk and qpdf for this purpose. They are available on Windows and linux.

Читайте также: Ctrl d linux сигналы

Use scoop to install them easily on Windows:

scoop install pdftk scoop install qpdf

Here is the example of the code to keep only pages 1-9 and 26-end of the original file input.pdf and save them to outputfile.pdf.:

pdftk example: (taken from here)

pdftk input.pdf cat 1-9 26-end output outputfile.pdf

qpdf example:

qpdf input.pdf --pages . 1-9,26-z -- outputfile.pdf

I’ve used PDF SAM (Split And Merge) ( http://www.pdfsam.org/ ) numerous times and it works well.

It’s a free Java app, so you will need Java installed. It allows PDFs to be extracted to single pages and then reattached back again.

You can use any PDF editor, or if you don’t want to download/install anything big, use the command-line portable pdftk (PDF Toolkit). Just extract the EXE and DLL to the same directory as your PDF, then from the command prompt use a command similar to the following:

pdftk in.pdf cat 1-12 14-end output out.pdf

This will delete page 13 from the PDF. See the man(ual) and examples pages for more help/options, or just type pdftk —help .

Removing PDF usage restrictions, FOSS-wise, there is PDFCrack, not sure if it does actually remove the security though, it’s just a password cracker.

Remove pages with redundant content from PDF document

Have you looked into pdfbox? You can invoke its various features from the command line. You can extract each page as text, use diff to see if each successive page has mostly additions to the previous one, keep track of the interesting pages, then use pdfbox again extract only those pages.

I assume from your rating that you don’t need detailed instructions for how to accomplish all this 🙂

I exactly ran into need of same thing. So I created a Python Script that automates this checking & removal of pages with redundant content from PDF. Give it a check here.

This works well for repetitive text content, yet it may not keep the images if an animation on same slide replaces previous image — because I didn’t needed that. If anyone finds a way, please let me know by opening a PR — I’ll be happy to improve it.

PS: Thanks for your elaborated question! It helped me writing a nice readme for this script. 🙂

I’ve written a CLI utility that removes the «animation» (step-by-step reveal) slides from a pdf.

It takes a pdf file that was exported from a presentation.
It detects the image difference between two consecutive pages (assuming white background).
It omits a page if its consecutive page only enhances it.

Because it does image differences, it shouldn’t matter if there are problems extracting the text or if the only difference between two consecutive slides is an image.

It’s called pdfdeanimate-image.py and can be found in this MIT-licensed repo: https://github.com/schokotets/pdf-slides-utils — I’m open to pull requests, as the implementation isn’t very clean so far.

I’ve copied the current python source code here. It requires poppler-utils for rendering the pdf and pdftk for concatenating the pdf. It creates a directory to store the .pgm files, which are grayscale slide renders.

Читайте также: Linux sockets in windows

Run it via: ./pdfdeanimate-image.py slides.pdf .
The result is a file stripped-slides.pdf .

#!/usr/bin/env nix-shell #!nix-shell -i python3 -p poppler_utils pdftk python38 python38Packages.numpy python38Packages.pillow import subprocess import sys import os import glob import numpy from PIL import Image, ImageOps pdffile = sys.argv[1] pdffile_name = pdffile.rsplit(".",1)[0] pgmdir_name = pdffile_name+"-pgm" pgmfile_name = pgmdir_name+"/"+pdffile_name try: os.mkdir(pgmdir_name) subprocess.run(["pdftoppm", "-gray", pdffile, pgmfile_name]) print("converted pdf to pgm files") except FileExistsError: print("assuming pdf is already converted to pgm") lastpix = None haslastpix = False containpages = [] currenthold = -1 filelist = glob.glob(os.path.join(pgmdir_name, '*.pgm')) for filename in sorted(filelist, key=lambda s: s.lower()): pagenr = filename.rsplit("/",1)[-1].rsplit(".",1)[0].rsplit("-",1)[-1] img = Image.open(filename) pix = numpy.array(img) img.close() if haslastpix: isconsecutive = numpy.all(lastpix >= pix) if not isconsecutive: containpages.append(currenthold) lastpix = pix haslastpix = True currenthold = pagenr containpages.append(currenthold) print(f"reduced -pages pdf to -pages pdf") subprocess.run(["pdftk", pdffile, "cat"] + containpages + ["output", "stripped-"+pdffile])

How to remove images from a PDF file, As we’re talking about a 500 pages document with multiple images on each page, I’m looking for an automated way to remove every picture. – Ornux

Remove the last page of a pdf file using PDFtk?

This will create the outfile.pdf with all but the last page in infile.pdf

pdftk infile.pdf cat 1-r2 output outfile.pdf

Explanation of parameters

You can reference page numbers in reverse order by prefixing them with the letter r. For example, page r1 is the last page of the document, r2 is the next-to-last page of the document, and rend is the first page of the document. You can use this prefix in ranges, too, for example r3-r1 is the last three pages of a PDF.

More examples are here: https://www.pdflabs.com/docs/pdftk-cli-examples/

With cpdf, you can reference a page by how far it is from the end of the document, using a tilde, as well as the beginning.

You need to find out the page count, then use this with the pdftk cat function, since (AFAICT) pdftk does not allow one to specify an «offset from last».

A tool like ‘pdfinfo’ from Poppler (http://poppler.freedesktop.org/) can provide this.

Wrapping this in a bit of bash scripting can easily automate this process:

page_count=`pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) pdftk A="$INFILE" cat A1-$page_count output "$OUTFILE"

Obviously adding parameters, error checking, and what-not also could be placed in said script:

#! /bin/sh ### Path to the PDF Toolkit executable 'pdftk' pdftk='/usr/bin/pdftk' pdfinfo='/usr/bin/pdfinfo' #################################################################### script=`basename "$0"` ### Script help if [ "$1" = "" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ] || [ "$1" = "-?" ] || [ "$1" = "/?" ]; then echo "$script: []" echo " Removes the last page from the PDF, overwriting the source" echo " if no output filename is given" exit 1 fi ### Check we have pdftk available if [ ! -x "$pdftk" ] || [ ! -x "$pdfinfo" ]; then echo "$script: The PDF Toolkit and/or Poppler doesn't seem to be installed" echo " (was looking for the [$pdftk] and [$pdfinfo] executables)" exit 2 fi ### Check our input is OK INFILE="$1" if [ ! -r "$INFILE" ]; then echo "$script: Failed to read [$INFILE]" exit 2 fi OUTFILE="$2" if [ "$OUTFILE" = "" ]; then echo "$script: Will overwrite [$INFILE] if processing is ok" fi timestamp=`date +"%Y%m%d-%H%M%S"` tmpfile="/tmp/$script.$timestamp" page_count=`$pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) ### Do the deed! $pdftk A="$INFILE" cat A1-$page_count output "$tmpfile" ### Was it good for you? if [ $? -eq 0 ]; then echo "$script: PDF Toolkit says all is good" if [ "$OUTFILE" = "" ]; then echo "$script: Overwriting [$INFILE]" cp -f "$tmpfile" "$INFILE" else echo "$script: Creating [$OUTFILE]" cp -f "$tmpfile" "$OUTFILE" fi fi ### Clean Up if [ -f "$tmpfile" ]; then rm -f "$tmpfile" fi

Extract a page from a pdf flle, The magic combination of options is qpdf —empty —pages infile.pdf 1-5 — outfile.pdf .

Remove only 1st page from a LOT of pdf files

You can do this with a free program called pdftk, available here.

You can use the following commands to take every PDF in the current directory and copy them to the ‘trimmed’ directory with the first page removed:

mkdir trimmed for i in *pdf ; do pdftk "$i" cat 2-end output "trimmed/$i" ; done

This looks like a job for PdfToolKit. This a command line utility to manipulate pdfs

First, install PDFToolkil, either from the Software Centre or using the command line:

sudo apt-get install pdftk

Now the command to remove the first page from a normal (non-protected pdf) would be:

pdftk original.pdf cat 2-end output outputname.pdf

If the pdf is protected you will need to give the passwords to pdftk.

To convert a large number of pdf’s you will need to write a small script that takes care of running pdftk for each one.

You can use pdf-stapler for this task.

for i in *.pdf; do pdf-stapler del "$i" 1 t.pdf && mv t.pdf "$i"; done

Ubuntu PDF Editor, Okular is a famous open-source and free document viewer created by KDE which contains common PDF editing aspects. In Okular, once we open the PDF file, we can

Источник

How to replace a single page in a pdf using another pdf in linux?

The output consists of the first 12 pages of inA.pdf , followed by page 3 of inB.pdf and then pages 14 until end of inA.pdf .

Many Linux distributions provide a PDFtk package you can download and install using their package manager.

Thanks! Note, A= and B= must be uppercase. When I tried lowercase pdftk though a= was part of the filename.

Didn’t work for me: ` pdftk A=./inA.pdf B=./inB.pdf A1-4 B1 A6-end output out.pdf Error: Unable to find file. Error: Failed to open PDF file: A1-4 Error: Unable to find file. Error: Failed to open PDF file: B1 Error: Unable to find file. Error: Failed to open PDF file: A6-end Errors encountered. No output created. Done. Input errors, so no output created. ` — I used qpdf and worked.

@azbarcea: the cat is missing. The complete command: pdftk A=./inA.pdf B=./inB.pdf cat A1-4 B1 A6-end output out.pdf . While qpdf certainly has its merits, pdftk allows additional rotation parameters for the cat operation. Try pdftk —help

This will not preserve bookmarks. As a workaround, you can use dump_data (check the docs) to dump bookmarks into a text file, and update_info to push those bookmarks to your output pdf file. The pdftk data dump only preserves page numbers, not offsets within the page. So a bookmark links to something at the bottom of the page after going through dump data and update info, the link will go to the top of the page.

Источник