Doc to html in linux

doc -> html

На моем стареньком ноуте OO незапустишь. Но регулярно появляются документы (MS Word, OO) простой структуры.

Может есть какие-нибудь конверторы легкие, переводящие doc в html, или выуживающие текст?

Re: doc -> html

Re: doc -> html

Я точно не помню, но по-моему была програ catdoc, которая как раз «выуживала текст»

Re: doc -> html

$apt-cache show antiword Package: antiword Priority: optional Section: text Installed-Size: 500 Maintainer: Bdale Garbee Architecture: i386 Version: 0.32-2 Depends: libc6 (>= 2.2.4-4) Filename: pool/main/a/antiword/antiword_0.32-2_i386.deb Size: 88490 MD5sum: 7c19befb191b9a5a88e77a7e87310d3e Description: Converts MS Word files to text and ps Antiword is a free MS Word reader. . It converts the binary files from MS Word 6, 7, 97 and 2000 to text and Postscript.

Re: doc -> html

$apt-cache show catdoc Package: catdoc Priority: optional Section: text Installed-Size: 636 Maintainer: Pawel Wiecek Architecture: i386 Version: 0.91.5-1.woody3 Depends: libc6 (>= 2.2.4-4) Suggests: wish Filename: pool/main/c/catdoc/catdoc_0.91.5-1.woody3_i386.deb Size: 66898 MD5sum: 94f0f2f0bccb8abbed2f70fd70d8d9f1 Description: MS-Word to TeX or plain text converter This program extracts text from MS-Word files, trying to preserve as many special printable characters as possible. catdoc supports everything up to Word-97. . It doesn't even try to preserve fancy Word formatting, because Word users usually don't care about document structure, and it is this very thing which is important to LaTeX users. . Also provided is xls2csv, which extracts data from Excel spreadsheets and outputs it in comma-separated-value format. . This package suggests tk because it also includes wordview, an optional Tk-based GUI for catdoc. The MIME config provided in this package will use wordview is X is running, or catdoc directly if it is not.

Re: doc -> html

wvHtml(1) wvHtml(1) NAME wvHtml - convert msword documents to HTML4.0 SYNOPSIS wvHtml in_word_doc out_html_doc DESCRIPTION wvHtml converts word documents into W3C certified HTML4.0 format. You can use Netscape or some other browser to then view your docs. MORE INFORMATION http://wvware.sourceforge.net SEE ALSO wvAbw(1), wvWare(1), wvLatex(1), wvCleanLatex(1), wvPS(1), wvDVI(1), wvPDF(1), wvText(1), wvWml(1), wvMime(1), catdoc(1), word2x(1) AUTHOR Dom Lachowicz (current author and maintainer) WEB: http://wvware.sourceforge.net MAIL: cinamod@hotmail.com

Источник

How to convert to HTML code?

I’m not sure what «html entities version» means. Can you elaborate on how this differs from regular html conversioni? If you just want a text to html coverter, a quick search shows txt2html.sourceforge.net.

Читайте также:  Линукс вернуть удаленный файл

5 Answers 5

The perl CGI module has a escapeHTML function that makes it pretty easy:

perl -e 'use CGI qw(escapeHTML); print escapeHTML("\n");' 
perl -p -e 'BEGIN < use CGI qw(escapeHTML); >$_ = escapeHTML($_);' FILENAME 

The recode utility supports HTML as one of the encodings. (You can even specify an HTML version.) In the text-to-entities direction, it will also recode non-ASCII characters into entities; you need to specify the correct input encoding (e.g. ASCII, latin1, utf-8, …).

recode utf8:html output-file.txt recode l1..html file-to-recode.txt 

xmlstarlet can do it both ways:

echo 'Ampersands & angle brackets need to be encoded.' | xmlstarlet esc | xmlstarlet unesc 

I’m not sure of your original goal/purpose, but if you are wanting to show PHP source to someone —

You can rename the file to *.phps extension and (in many Apache/PHP configurations, but not all) it will not parse the PHP when serving the file to the user; it will show the source directly.

I don’t think this is what he wants; he wants either a command-line tool or a PHP script that will take text input and escape HTML entities

For anyone interested: I wrote a well documented easily readable open-source bash script that converts accented letters in HTML entities. You can find it here: http://www.lugato.co.uk/silvio_dwl.html It is easy to understand so it can be easily modified to convert additional characters in HTML entities 😉 Enjoy it! Silvio

Welcome to Unix & Linux! Whilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference.

You must log in to answer this question.

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.14.43533

Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group.
This site is not affiliated with Linus Torvalds or The Open Group in any way.

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Читайте также:  Linux discord screen sharing with sound

Documents to HTML converter

License

dmryutov/document2html

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Documents to HTML converter

Extension Text Styles extraction Images extraction
HTML/XHTML Yes Yes Yes
XML Yes Not applicable Not applicable
DOCX Yes Yes Yes
DOC Yes No No
RTF Yes Yes Yes
ODT Yes Yes Yes
XLSX Yes Yes Yes
XLS Yes Yes No
CSV Yes Not applicable Not applicable
TXT/MD Yes Yes Yes
JSON Yes Not applicable Not applicable
EPUB Yes Yes Yes
PDF Yes No Yes
PPT Yes No No

cURL for downloading images:

apt-get install libcurl4-openssl-dev or brew install curl 

iconv for encoding conversion

sudo apt-get install libc6 or brew install libiconv 

Tidy for cleaning and repairing HTML

sudo apt-get install libtidy-dev or brew install tidy-html5 

file for determining file extension

  • getoptpp — Command line options parser
  • lodepng — PNG encoder and decoder
  • miniz — Data compression library
  • json — JSON parser
  • pygixml — XML parser

Make sure the Qt (>= 5.6) development libraries are installed:

  • In Ubuntu/Debian: apt-get install qt5-default qttools5-dev-tools zlib1g-dev
  • In Fedora: sudo dnf builddep tiled
  • In Arch Linux: pacman -S qt
  • In Mac OS X with Homebrew:
    • brew install qt5
    • brew link qt5 —force

    Now you can compile by running:

    qmake (or qmake-qt5 on some systems) make 

    To do a shadow build, you can run qmake from a different directory and refer it to space-invaders.pro, for example:

    mkdir build cd build qmake ../src/document2html.pro make 

    If you have ideas how to build project with CMake instead of Qt please contact me.

     document2html -f|-d -o [-si] document2html -h document2html -v 
    Short Flag Long Flag Description
    -f —file Input file
    -d —dir Input directory
    -o —out Output directory
    -s —style Extract styles
    -i —image Extract images
    -h —help Display help message
    -v —version Display package version
    • rembish — DOC, PPT and PDF converter (PHP)
    • PolicyStat — DOCX converter (Python)
    • python-excel — XLSX and XLS converter (Python)
    • lvu — RTF converter (C++)
    • adhocore — TXT/MD converter (PHP)
    • ahupp — libmagic wrapper (Python)

    If you have questions regarding the libraries, I would like to invite you to open an issue at Github. Please describe your request, problem, or question as detailed as possible, and also mention the version of the libraries you are using as well as the version of your compiler and operating system. Opening an issue at Github allows other users and contributors to this libraries to collaborate.

    About

    Documents to HTML converter

    Источник

    Convert | Google Docs to HTML

    Google Docs is a web-based online editor tool that allows the creation and modification of documents. Different blogs and websites acquire content that is already written in the document. Google Docs fulfill requirements through built-in features by downloading files in a “.html” extension. This guide will teach you how Google Docs can be converted into HTML file format.

    How to Convert Google Docs to HTML?

    By default, the Google Docs file contains a “.doc” extension. Here, the following steps are carried out to convert the Google Docs to HTML:

    Step 1: Open Google Docs

    Open the existing or blank Google Docs to convert the document “.doc” into “.html”. In this scenario, an existing document is carried out as shown in below figure:

    Step 2: Choose Web Page (.html, zipped) Option

    To convert the document into HTML format, go to the “File” tab. From the dropdown, hover over the “Download” option and choose the “Web Page (.html, zipped)” option:

    Step 3: Verify the Downloaded File

    Verify that the ”Docs.zip” has been successfully downloaded, as in our case it is shown in the below screenshot:

    Step 4: Open the Docs file

    Navigate to the directory where the file is downloaded. Open the zipped folder, the HTML file will be there, as in our case, it is shown below:

    Step 5: Verify the Docs.html

    After opening the “Docs.html”, you can verify the content of the Google Docs has been opened in the Google Chrome browser:

    Great Work! You have successfully converted Google Docs to HTML.

    Conclusion

    The Google Docs file can be converted to HTML using the “Web Page (.html, zipped)” option. This option is available in the “Download ” option of the “File” tab. After conversion, the Google Docs content can be seen in any browser. This Google Docs post has provided a step-by-step guide to converting the Google Docs file into HTML.

    Источник

Оцените статью
Adblock
detector