Docx to pdf python linux

Converting docx to pdf with pure python (on linux, without libreoffice)

Another one you could use is libreoffice, however as the first responder said the quality will never be as good as using the actual comtypes.

anyways, after you have installed libreoffice, here is the code to do it.

from subprocess import Popen LIBRE_OFFICE = r"C:\Program Files\LibreOffice\program\soffice.exe" def convert_to_pdf(input_docx, out_folder): p = Popen([LIBRE_OFFICE, '--headless', '--convert-to', 'pdf', '--outdir', out_folder, input_docx]) print([LIBRE_OFFICE, '--convert-to', 'pdf', input_docx]) p.communicate() sample_doc = 'file.docx' out_folder = 'some_folder' convert_to_pdf(sample_doc, out_folder) 

The PythonAnywhere help pages offer information on working with PDF files here: https://help.pythonanywhere.com/pages/PDF

Summary: PythonAnywhere has a number of Python packages for PDF manipulation installed, and one of them may do what you want. However, shelling out to abiword seems easiest to me. The shell command abiword —to=pdf filetoconvert.docx will convert the docx file to a PDF and produce a file named filetoconvert.pdf in the same directory as the docx. Note that this command will output an error message to the standard error stream complaining about XDG_RUNTIME_DIR (or at least it did for me), but it still works, and the error message can be ignored.

Here is docx to pdf code for linux (for windows just download libreoffice and put soffice path instead of soffice)

import subprocess def generate_pdf(doc_path, path): subprocess.call(['soffice', # '--headless', '--convert-to', 'pdf', '--outdir', path, doc_path]) return doc_path generate_pdf("docx_path.docx", "output_path") 

Источник

docx2pdf 0.1.8

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

Ссылки проекта

Статистика

Метаданные

Лицензия: MIT License (MIT)

Требует: Python >=3.5

Сопровождающие

Классификаторы

Описание проекта

docx2pdf

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).

Install

brew install aljohri/-/docx2pdf 

CLI

usage: docx2pdf [-h] [--keep-active] [--version] input [output] Example Usage: Convert single docx file in-place from myfile.docx to myfile.pdf: docx2pdf myfile.docx Batch convert docx folder in-place. Output PDFs will go in the same folder: docx2pdf myfolder/ Convert single docx file with explicit output filepath: docx2pdf input.docx output.docx Convert single docx file and output to a different explicit folder: docx2pdf input.docx output_dir/ Batch convert docx folder. Output PDFs will go to a different explicit folder: docx2pdf input_dir/ output_dir/ positional arguments: input input file or folder. batch converts entire folder or convert single file output output file or folder optional arguments: -h, --help show this help message and exit --keep-active prevent closing word after conversion --version display version and exit 

Library

See CLI docs above (or in docx2pdf --help ) for all the different invocations. It is the same for the CLI and python library.

Jupyter Notebook

If you are using this in the context of jupyter notebook, you will need ipywidgets for the tqdm progress bar to render properly.

pip install ipywidgets jupyter nbextension enable --py widgetsnbextension `` 

Источник

Читайте также:  Kali linux sudo пароль

Converting docx to pdf with pure python (on linux, without libreoffice)

I’m dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere — linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:

import subprocess try: from comtypes import client except ImportError: client = None def doc2pdf(doc): """ convert a doc/docx document to pdf format :param doc: path to document """ doc = os.path.abspath(doc) # bugfix - searching files in windows/system32 if client is None: return doc2pdf_linux(doc) name, ext = os.path.splitext(doc) try: word = client.CreateObject('Word.Application') worddoc = word.Documents.Open(doc) worddoc.SaveAs(name + '.pdf', FileFormat=17) except Exception: raise finally: worddoc.Close() word.Quit() def doc2pdf_linux(doc): """ convert a doc/docx document to pdf format (linux only, requires libreoffice) :param doc: path to document """ cmd = 'libreoffice --convert-to pdf'.split() + [doc] p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE) p.wait(timeout=10) stdout, stderr = p.communicate() if stderr: raise subprocess.SubprocessError(stderr) 

As you can see, one method requires comtypes , another requires libreoffice as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?

Источник

.doc to pdf using python

I’am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don’t know what I’m looking for). Right now I’m reading through this. Dont know how useful this is going to be.

14 Answers 14

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sys import os import comtypes.client wdFormatPDF = 17 in_file = os.path.abspath(sys.argv[1]) out_file = os.path.abspath(sys.argv[2]) word = comtypes.client.CreateObject('Word.Application') doc = word.Documents.Open(in_file) doc.SaveAs(out_file, FileFormat=wdFormatPDF) doc.Close() word.Quit() 

You could also use pywin32, which would be the same except for:

word = win32com.client.Dispatch('Word.Application') 

For many files, consider setting: word.Visible = False to save time and processing of the word files (MS word will not display this way, code will run in background essentially)

Читайте также:  Samba linux debian настройка

I’ve managed to get this working for powerpoint documents. Use Powerpoint.Application , Presentations.Open and FileFormat=32 .

I am using a linux server and these libraries dont work in linux.. is there any other way to make it work in linux

You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.

from docx2pdf import convert convert("input.docx") convert("input.docx", "output.pdf") convert("my_docx_folder/") 
pip install docx2pdf docx2pdf input.docx output.pdf 

Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf

Unfortunately, it requires Microsoft Office to be installed and thus only works on Windows and macOS.

@AlJohri take a look here michalzalecki.com/converting-docx-to-pdf-using-python this solution works on both windows and linux. runnig on linux it’s a must bcause the most of deployement servers use linux

I have tested many solutions but no one of them works efficiently on Linux distribution.

I recommend this solution :

import sys import subprocess import re def convert_to(folder, source, timeout=None): args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source] process = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout) filename = re.search('-> (.*?) using filter', process.stdout.decode()) return filename.group(1) def libreoffice_exec(): # TODO: Provide support for more platforms if sys.platform == 'darwin': return '/Applications/LibreOffice.app/Contents/MacOS/soffice' return 'libreoffice' 

and you call your function:

result = convert_to('TEMP Directory', 'Your File', timeout=15) 

Thank you, sir, for this solution. It is actually even runnable through Google Colab so you can do this on the fly.

I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven’s answer is right, but it will fail on my computer. There are two key points to fix it here:

(1). The first time when I created the ‘Word.Application’ object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the ‘Word.Application’ object will be deleted by OS. )

(2). After doing (1), the program will work well sometimes but may fail often. The crash error «COMError: (-2147418111, ‘Call was rejected by callee.’, (None, None, None, 0, None))» means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.

After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.

 import os import comtypes.client import time wdFormatPDF = 17 # absolute path is needed # be careful about the slash '\', use '\\' or '/' or raw string r". " in_file=r'absolute path of input docx file 1' out_file=r'absolute path of output pdf file 1' in_file2=r'absolute path of input docx file 2' out_file2=r'absolute path of outputpdf file 2' # print out filenames print in_file print out_file print in_file2 print out_file2 # create COM object word = comtypes.client.CreateObject('Word.Application') # key point 1: make word visible before open a new document word.Visible = True # key point 2: wait for the COM Server to prepare well. time.sleep(3) # convert docx file 1 to pdf file 1 doc=word.Documents.Open(in_file) # open docx file 1 doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion doc.Close() # close docx file 1 word.Visible = False # convert docx file 2 to pdf file 2 doc = word.Documents.Open(in_file2) # open docx file 2 doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion doc.Close() # close docx file 2 word.Quit() # close Word Application 

Источник

Читайте также:  Set time with date linux

How to Convert DocX to Pdf in Python

convert docx to pdf python linux

Sometimes you may need to convert docx files to pdfs. In this article, we will look at how to convert docx to pdf using Python. We will use docx2pdf library for this purpose.

How to Convert DocX to Pdf in Python

Here are the steps to convert docx to pdf files. Please note, Docx2pdf is available only in Windows. It is not supported in Linux. In such cases, it is better to use an online docx to pdf convertor like SmallPDF.

1. Install docx2pdf

Open command prompt & run the following command to install docx2pdf

2. Convert Docx to Pdf using command line

Here’s the syntax of docx2pdf

In the above command, you need to specify the file path of docx file as first argument and the file path of pdf file to written as second argument.

Here’s an example to convert docx to pdf

C:\> docx2pdf C:\Project\test.docx C:\Project\test.pdf

We have mentioned absolute paths for both input and outpur files. If you don’t mention absolute paths above, then docx2pdf will look for docx files as well as write pdf files in your present working directory.

3. Bulk Conversion using command line

You can also bulk convert a folder of docx to pdf files by specifying the folder path as input.

C:\> docx2pdf C:\Project\data_files

In the above command, docx2pdf will convert all docx files present in /home/ubuntu/data_files into pdf files.

You may also specify different input and output paths in docx2pdf command.

C:\> docx2pdf C:\Project\data_files C:\Project\test_files

4. Docx to PDF conversion from program

You may also import docx2pdf library within python program and use convert function to convert docx to pdf files.

using docx2pdf import convert #convert a single docx file to pdf file in same directory convert(test.docx) #convert docx to pdf specifying input & output paths convert('C:\Project\test.docx','C:\Project\test.pdf') #bulk conversion of files convert('C:\Project\')

As you can see it is very easy to convert docx to pdf files in python.

Источник

Оцените статью
Adblock
detector