- Converting docx to pdf with pure python (on linux, without libreoffice)
- docx2pdf 0.1.8
- Навигация
- Ссылки проекта
- Статистика
- Метаданные
- Сопровождающие
- Классификаторы
- Описание проекта
- docx2pdf
- Install
- CLI
- Library
- Jupyter Notebook
- Saved searches
- Use saved searches to filter your results more quickly
- License
- AlJohri/docx2pdf
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- Convert Word DOCX or DOC to PDF in Python
- Python Library for Word to PDF Conversion — Free Download#
- Convert Word DOCX to PDF in Python#
- Python Word to PDF with a Particular Standard#
- Python DOCX to PDF — Convert Range of Pages#
- DOC DOCX to PDF in Python — Apply Image Compression#
- Python DOCX to PDF Library — Get a Free Library License#
- Conclusion#
- See Also#
- How to Convert DocX to Pdf in Python
- How to Convert DocX to Pdf in Python
- 1. Install docx2pdf
- 2. Convert Docx to Pdf using command line
- 3. Bulk Conversion using command line
- 4. Docx to PDF conversion from program
Converting docx to pdf with pure python (on linux, without libreoffice)
I’m dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere — linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:
import subprocess try: from comtypes import client except ImportError: client = None def doc2pdf(doc): """ convert a doc/docx document to pdf format :param doc: path to document """ doc = os.path.abspath(doc) # bugfix - searching files in windows/system32 if client is None: return doc2pdf_linux(doc) name, ext = os.path.splitext(doc) try: word = client.CreateObject('Word.Application') worddoc = word.Documents.Open(doc) worddoc.SaveAs(name + '.pdf', FileFormat=17) except Exception: raise finally: worddoc.Close() word.Quit() def doc2pdf_linux(doc): """ convert a doc/docx document to pdf format (linux only, requires libreoffice) :param doc: path to document """ cmd = 'libreoffice --convert-to pdf'.split() + [doc] p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE) p.wait(timeout=10) stdout, stderr = p.communicate() if stderr: raise subprocess.SubprocessError(stderr)
As you can see, one method requires comtypes , another requires libreoffice as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?
docx2pdf 0.1.8
Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: MIT License (MIT)
Требует: Python >=3.5
Сопровождающие
Классификаторы
Описание проекта
docx2pdf
Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).
On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).
Install
brew install aljohri/-/docx2pdf
CLI
usage: docx2pdf [-h] [--keep-active] [--version] input [output] Example Usage: Convert single docx file in-place from myfile.docx to myfile.pdf: docx2pdf myfile.docx Batch convert docx folder in-place. Output PDFs will go in the same folder: docx2pdf myfolder/ Convert single docx file with explicit output filepath: docx2pdf input.docx output.docx Convert single docx file and output to a different explicit folder: docx2pdf input.docx output_dir/ Batch convert docx folder. Output PDFs will go to a different explicit folder: docx2pdf input_dir/ output_dir/ positional arguments: input input file or folder. batch converts entire folder or convert single file output output file or folder optional arguments: -h, --help show this help message and exit --keep-active prevent closing word after conversion --version display version and exit
Library
See CLI docs above (or in docx2pdf --help ) for all the different invocations. It is the same for the CLI and python library.
Jupyter Notebook
If you are using this in the context of jupyter notebook, you will need ipywidgets for the tqdm progress bar to render properly.
pip install ipywidgets jupyter nbextension enable --py widgetsnbextension ``
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
License
AlJohri/docx2pdf
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Added modifications to convert .doc-files for windows
Git stats
Files
Failed to load latest commit information.
README.md
Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).
On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).
brew install aljohri/-/docx2pdf
usage: docx2pdf [-h] [--keep-active] [--version] input [output] Example Usage: Convert single docx file in-place from myfile.docx to myfile.pdf: docx2pdf myfile.docx Batch convert docx folder in-place. Output PDFs will go in the same folder: docx2pdf myfolder/ Convert single docx file with explicit output filepath: docx2pdf input.docx output.pdf Convert single docx file and output to a different explicit folder: docx2pdf input.docx output_dir/ Batch convert docx folder. Output PDFs will go to a different explicit folder: docx2pdf input_dir/ output_dir/ positional arguments: input input file or folder. batch converts entire folder or convert single file output output file or folder optional arguments: -h, --help show this help message and exit --keep-active prevent closing word after conversion --version display version and exit
from docx2pdf import convert convert("input.docx") convert("input.docx", "output.pdf") convert("my_docx_folder/")
See CLI docs above (or in docx2pdf —help ) for all the different invocations. It is the same for the CLI and python library.
If you are using this in the context of jupyter notebook, you will need ipywidgets for the tqdm progress bar to render properly.
pip install ipywidgets jupyter nbextension enable --py widgetsnbextension ``
Convert Word DOCX or DOC to PDF in Python
Word to PDF is one of the most popular and immensely performed document conversions. The DOCX or DOC files are converted to PDF format before they are printed or shared. In this article, we will automate Word to PDF conversion in Python. The steps and code samples will demonstrate how to convert Word DOCX or DOC to PDF with Python. Also, you will learn about different options to customize Word to PDF conversion.
Python Library for Word to PDF Conversion — Free Download#
For converting Word documents to PDF format, we will use Aspose.Words for Python. It is a feature-rich Python library for creating and manipulating Word documents. Moreover, it lets you convert DOCX and DOC files to PDF format with high fidelity. The library is hosted on PyPI and you can install it using the following pip command.
Convert Word DOCX to PDF in Python#
The following are the steps to convert a Word document to PDF in Python.
- Load the Word document using Document class.
- Convert Word document to PDF using Document.save() method.
The following code sample shows how to convert a Word DOCX file to PDF.
Python Word to PDF with a Particular Standard#
You can also specify the particular standard for the converted PDF document such as PDF/A. The following are the steps to specify the PDF standard in Word to PDF conversion using Python.
- Load the Word document using Document class.
- Create an object of PdfSaveOptions class and set PDF standard using PdfSaveOptions.compliance property.
- Convert Word document to PDF using Document.save() method.
The following code sample shows how to set a particular standard in Word DOCX to PDF conversion.
Python DOCX to PDF — Convert Range of Pages#
You can also specify the range of pages you want to convert to PDF format. For this, you can use PdfSaveOptions.page_set property. The following code sample shows how to convert a range of pages in Word document to PDF.
DOC DOCX to PDF in Python — Apply Image Compression#
Aspose.Words for Python also lets you apply image compression in the converted PDF document. In addition, you can specify the JPEG quality for the images. The following are the steps to set image compression while converting a Word DOCX to PDF in Python.
- Load the Word document using Document class.
- Create an object of PdfSaveOptions class.
- Set image compression using PdfSaveOptions.image_compression property.
- Set JPEG quality using PdfSaveOptions.jpeg_quality property.
- Convert Word document to PDF using Document.save() method.
The following code sample shows how to set image compression in Word to PDF conversion.
Python DOCX to PDF Library — Get a Free Library License#
You can get a temporary license in order to use Aspose.Words for Python without evaluation limitations.
Conclusion#
In this article, you have learned how to convert Word DOCX or DOC files to PDF in Python. Moreover, you have seen different options to customize the DOC or DOCX to PDF conversion in Python. You can learn more about Aspose.Words for Python using documentation. In case you would have any questions, feel free to let us know via our forum.
See Also#
How to Convert DocX to Pdf in Python
Sometimes you may need to convert docx files to pdfs. In this article, we will look at how to convert docx to pdf using Python. We will use docx2pdf library for this purpose.
How to Convert DocX to Pdf in Python
Here are the steps to convert docx to pdf files. Please note, Docx2pdf is available only in Windows. It is not supported in Linux. In such cases, it is better to use an online docx to pdf convertor like SmallPDF.
1. Install docx2pdf
Open command prompt & run the following command to install docx2pdf
2. Convert Docx to Pdf using command line
Here’s the syntax of docx2pdf
In the above command, you need to specify the file path of docx file as first argument and the file path of pdf file to written as second argument.
Here’s an example to convert docx to pdf
C:\> docx2pdf C:\Project\test.docx C:\Project\test.pdf
We have mentioned absolute paths for both input and outpur files. If you don’t mention absolute paths above, then docx2pdf will look for docx files as well as write pdf files in your present working directory.
3. Bulk Conversion using command line
You can also bulk convert a folder of docx to pdf files by specifying the folder path as input.
C:\> docx2pdf C:\Project\data_files
In the above command, docx2pdf will convert all docx files present in /home/ubuntu/data_files into pdf files.
You may also specify different input and output paths in docx2pdf command.
C:\> docx2pdf C:\Project\data_files C:\Project\test_files
4. Docx to PDF conversion from program
You may also import docx2pdf library within python program and use convert function to convert docx to pdf files.
using docx2pdf import convert #convert a single docx file to pdf file in same directory convert(test.docx) #convert docx to pdf specifying input & output paths convert('C:\Project\test.docx','C:\Project\test.pdf') #bulk conversion of files convert('C:\Project\')
As you can see it is very easy to convert docx to pdf files in python.