Linux pdf вырезать страницу

Содержание

How can I extract a page range / a part of a PDF?
22 Answers 22
Installation instructions:
Page range — Nautilus script
Partial pages — PDF Arranger
Page elements — Inkscape
Remove the last page of a pdf file using PDFtk?

How can I extract a page range / a part of a PDF?

Do you have any idea how to extract a part of a PDF document and save it as PDF? On OS X it is absolutely trivial by using Preview. I tried PDF editor and other programs but to no avail. I would like a program where I select the part that I want and then save it as a PDF file with a simple command like CMD + N on OS X. I want the extracted part to be saved as PDF and not as JPEG, etc.

22 Answers 22

pdftk is a useful multi-platform tool for the job (pdftk homepage).

pdftk full-pdf.pdf cat 12-15 output outfile_p12-15.pdf

you pass the filename of the main pdf, then you tell it to only include certain pages (12-15 in this example) and output it to a new file.

Installation instructions:

To install the snap version, which is an unofficial repackaging of an old version of PDFtk (repackaged by Scott Moser), visit this link or run:

Alternatively, you can install an open source port of PDFtk to Java by Marc Vinyals, by running:

sudo apt install pdftk-java

Another alternative is PDFtk Server, available from the website: https://www.pdflabs.com/tools/pdftk-server/ . This version is free of charge for personal use, but it is not open source.

Although pdftk is certainly a tool that can do the job, I would recommend against it. This is not free software, but an clunky piece of shareware. Also it needs the JVM. A more reasonable tool is qpdf , as suggested in another answer.

Very simple. Use the default PDF reader, select «Print To File» and that’s it!

Note that with this way, the text will no more be searchable, instead all texts are converted to images: this is how «Print» works.

Produces catastrophic results with beamer files, maps and any other documents that do not conform with the printer page format.

so it does not «extract» the page range. It creates a new pdf from the old one, as if you used high-definition printer/scanner pair.

Good for simple cases, but undesired results in documents with highlighting comments: the highlighting becomes 100% opacity and blocks the text.

QPDF is great. Use it this way to extract pages 1 to 10 from input.pdf and save it as output.pdf :

qpdf input.pdf --pages . 1-10 -- output.pdf

This preserves all metadata associated with that file.

If you wanted pages 1 through 5 from infile.pdf but you wanted the rest of the metadata to be dropped, you could instead run

qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf

Here’s a link to the current documentation, giving more examples of page selection.

You can install it by invoking:

It is a great tool for PDF manipulation. It’s very fast and has very few dependencies. From QPDF’s GitHub repo:

QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous other features. It can also be used for splitting and merging files, creating PDF files (but you have to supply all the content yourself), and inspecting files for study or analysis.

The —pages flag allows you to splice pages from multiple PDFs. Note that you can avoid duplicating the name by using . in place of the input file in the —pages options: qpdf —pages . 1-10 — input.pdf output.pdf .

When I extract only a few pages, why does the resulting PDF has the same data size as the source PDF?

qpdf input.pdf —pages . 1 — output.pdf was not working for me. (version 8.0.2). This is what worked for me: qpdf input.pdf —pages input.pdf 1 — output.pdf

Page range — Nautilus script

I created a slightly more advanced script based on the tutorial @ThiagoPonte linked to. Its key features are

that it’s GUI based,
compatible with spaces in file names,
and based on three different backends that are capable of preserving all attributes of the original file

#!/bin/bash # # TITLE: PDFextract # # AUTHOR: (c) 2013-2015 Glutanimate (https://github.com/Glutanimate) # # VERSION: 0.2 # # LICENSE: GNU GPL v3 (http://www.gnu.org/licenses/gpl.html) # # OVERVIEW: PDFextract is a simple PDF extraction script based on Ghostscript/qpdf/cpdf. # It provides a simple way to extract a page range from a PDF document and is meant # to be used as a file manager script/addon (e.g. Nautilus script). # # FEATURES: - simple GUI based on YAD, an advanced Zenity fork. # - preserves _all_ attributes of your original PDF file and does not compress # embedded images further than they are. # - can choose from three different backends: ghostscript, qpdf, cpdf # # DEPENDENCIES: ghostscript/qpdf/cpdf poppler-utils yad libnotify-bin # # You need to install at least one of the three backends supported by this script. # # - ghostscript, qpdf, poppler-utils, and libnotify-bin are available via # the standard Ubuntu repositories # - cpdf is a commercial CLI PDF toolkit that is free for personal use. # It can be downloaded here: https://github.com/coherentgraphics/cpdf-binaries # - yad can be installed from the webupd8 PPA with the following command: # sudo add-apt-repository ppa:webupd8team/y-ppa-manager && apt-get update && apt-get install yad # # NOTES: Here is a quick comparison of the advantages and disadvantages of each backend: # # speed metadata preservation content preservation license # ghostscript: -- ++ ++ open-source # cpdf: - ++ ++ proprietary # qpdf: ++ + ++ open-source # # Results might vary depending on the document and the version of the tool in question. # # INSTALLATION: https://askubuntu.com/a/236415 # # This script was inspired by Kurt Pfeifle's PDF extraction script # (http://www.linuxjournal.com/content/tech-tip-extract-pages-pdf) # # Originally posted on askubuntu # (https://askubuntu.com/a/282453) # Variables DOCUMENT="$1" BACKENDSELECTION="^qpdf!ghostscript!cpdf" # Functions check_input() < if [[ -z "$1" ]]; then notify "Error: No input file selected." exit 1 elif [[ ! "$(file -ib "$1")" == *application/pdf* ]]; then notify "Error: Not a valid PDF file." exit 1 fi >check_deps () < for i in "$@"; do type "$i" >/dev/null 2>&1 if [[ "$?" != "0" ]]; then MissingDeps+="$i" fi done > ghostscriptextract()< gs -dFirstPage="$STARTPAGE "-dLastPage="$STOPPAGE" -sOutputFile="$OUTFILE" -dSAFER -dNOPAUSE -dBATCH -dPDFSETTING=/default -sDEVICE=pdfwrite -dCompressFonts=true -c \ ".setpdfwrite > /GrayACSImageDict \ > /PreserveOverprintSettings false /MonoImageResolution 300 /MonoImageFilter /FlateEncode \ /GrayImageResolution 300 /LockDistillerParams false /EncodeGrayImages true /MaxSubsetPCT 100 /GrayImageDict > /ColorImageFilter /FlateEncode /EmbedAllFonts true /UCRandBGInfo /Remove /AutoRotatePages /PageByPage /ColorImageResolution 300 /ColorImageDict > /CompatibilityLevel 1.7 /EncodeMonoImages true /GrayImageDownsampleThreshold 1.5 \ /AutoFilterGrayImages false /GrayImageFilter /FlateEncode /DownsampleGrayImages false /AutoFilterColorImages false /DownsampleColorImages false /CompressPages true \ /ColorImageDownsampleThreshold 1.5 /PreserveHalftoneInfo false >> setdistillerparams" -f "$DOCUMENT" > cpdfextract() < cpdf "$DOCUMENT" "$STARTPAGE-$STOPPAGE" -o "$OUTFILE" >qpdfextract() < qpdf --linearize "$DOCUMENT" --pages "$DOCUMENT" "$STARTPAGE-$STOPPAGE" -- "$OUTFILE" echo "$OUTFILE" return 0 # even benign qpdf warnings produce error codes, so we suppress them >notify() < echo "$1" notify-send -i application-pdf "PDFextract" "$1" >dialog_warning() < echo "$1" yad --center --image dialog-warning \ --title "PDFExtract Warning" \ --text "$1" \ --button="Try again:0" \ --button="Exit:1" [[ "$?" != "0" ]] && exit 0 >dialog_settings()< PAGECOUNT=$(pdfinfo "$DOCUMENT" | grep Pages | sed 's/[^0-9]*//') #determine page count SETTINGS=($(\ yad --form --width 300 --center \ --window-icon application-pdf --image application-pdf \ --separator=" " --title="PDFextract"\ --text "Please choose the page range and backend"\ --field="Start:NUM" 1[!1..$PAGECOUNT[!1]] --field="End:NUM" $PAGECOUNT[!1..$PAGECOUNT[!1]] \ --field="Backend":CB "$BACKENDSELECTION" \ --button="gtk-ok:0" --button="gtk-cancel:1"\ )) SETTINGSRET="$?" [[ "$SETTINGSRET" != "0" ]] && exit 1 STARTPAGE=$(printf %.0f $) #round numbers and store array in variables STOPPAGE=$(printf %.0f $) BACKEND="$" EXTRACTOR="$extract" check_deps "$BACKEND" if [[ -n "$MissingDeps" ]]; then dialog_warning "Error, missing dependency: $MissingDeps" unset MissingDeps dialog_settings return fi if [[ "$STARTPAGE" -gt "$STOPPAGE" ]]; then dialog_warning " Start page higher than stop page. " dialog_settings return fi OUTFILE="$ (p$-p$).pdf" > extract_pages() < $EXTRACTOR EXTRACTORRET="$?" if [[ "$EXTRACTORRET" = "0" ]]; then notify "Pages $STARTPAGE to $STOPPAGE succesfully extracted." else notify "There has been an error. Please check the CLI output." fi ># Main check_input "$1" dialog_settings extract_pages

Installation

Please follow the generic installation instructions for Nautilus scripts. Make sure to read the script header carefully as it will help to clarify the installation and usage of the script.

Partial pages — PDF Arranger

PDF Arranger is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.

Installation

sudo apt-get install pdfshuffler

PDF Arranger can crop and delete single PDF pages. You can use it to extract a page range from a document or even partial pages using the cropping function:

Page elements — Inkscape

Inkscape is a very powerful open-source vector graphics editor. It supports a wide range of different formats, including PDF files. You can use it to extract, modify and save page elements from a PDF file.

Installation

sudo apt-get install inkscape

1.) Open the PDF file of your choice with Inkscape. An import dialog will appear. Choose the page you want to extract elements from. Leave the other settings as they are:

2.) In Inkscape click and drag to select the element(s) you want to extract:

3.) Invert the selection with ! and delete the selected object with DELETE :

4.) Crop the document to the remaining objects by accessing the Document Properties dialog with CTRL + SHIFT + D and selecting «fit document to image»:

5.) Save the document as a PDF file from the File —> Save as dialog:

6.) If there are bitmap/raster images in your cropped document you can set their DPI in the dialog that appears next:

7.) If you followed all steps you will have produced a true PDF file that only consists of the objects of your choice:

Источник

Remove the last page of a pdf file using PDFtk?

You can reference page numbers in reverse order by prefixing them with the letter r. For example, page r1 is the last page of the document, r2 is the next-to-last page of the document, and rend is the first page of the document. You can use this prefix in ranges, too, for example r3-r1 is the last three pages of a PDF.

If you want to remove more than one page, you can change the range, for example 1-r3 does all but the last two pages.

You need to find out the page count, then use this with the pdftk cat function, since (AFAICT) pdftk does not allow one to specify an «offset from last».

A tool like ‘pdfinfo’ from Poppler (http://poppler.freedesktop.org/) can provide this.

Wrapping this in a bit of bash scripting can easily automate this process:

page_count=`pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) pdftk A="$INFILE" cat A1-$page_count output "$OUTFILE"

Obviously adding parameters, error checking, and what-not also could be placed in said script:

#! /bin/sh ### Path to the PDF Toolkit executable 'pdftk' pdftk='/usr/bin/pdftk' pdfinfo='/usr/bin/pdfinfo' #################################################################### script=`basename "$0"` ### Script help if [ "$1" = "" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ] || [ "$1" = "-?" ] || [ "$1" = "/?" ]; then echo "$script: []" echo " Removes the last page from the PDF, overwriting the source" echo " if no output filename is given" exit 1 fi ### Check we have pdftk available if [ ! -x "$pdftk" ] || [ ! -x "$pdfinfo" ]; then echo "$script: The PDF Toolkit and/or Poppler doesn't seem to be installed" echo " (was looking for the [$pdftk] and [$pdfinfo] executables)" exit 2 fi ### Check our input is OK INFILE="$1" if [ ! -r "$INFILE" ]; then echo "$script: Failed to read [$INFILE]" exit 2 fi OUTFILE="$2" if [ "$OUTFILE" = "" ]; then echo "$script: Will overwrite [$INFILE] if processing is ok" fi timestamp=`date +"%Y%m%d-%H%M%S"` tmpfile="/tmp/$script.$timestamp" page_count=`$pdfinfo "$INFILE" | grep 'Pages:' | awk ''` page_count=$(( $page_count - 1 )) ### Do the deed! $pdftk A="$INFILE" cat A1-$page_count output "$tmpfile" ### Was it good for you? if [ $? -eq 0 ]; then echo "$script: PDF Toolkit says all is good" if [ "$OUTFILE" = "" ]; then echo "$script: Overwriting [$INFILE]" cp -f "$tmpfile" "$INFILE" else echo "$script: Creating [$OUTFILE]" cp -f "$tmpfile" "$OUTFILE" fi fi ### Clean Up if [ -f "$tmpfile" ]; then rm -f "$tmpfile" fi

Источник