Linux convert to text file

converting binary to text in linux

I have a big binary file I have produced by writing an array of float numbers in binary format. Now how can I simply convert that binary file to text ?

I just opened a binary file using ofstream out(blah , ios::out | ios::binary) and wrote in it out.write((char* ) blah, size); . what format is this?

Make your goal clearer, please. Do you want to read the values back in and produce formatted output a la printf ? In that case, an unpack() Perl or PHP script would probably much easier than C++.

2 Answers 2

Use the UNIX od command, with the -t f4 option to read the file as 4 byte floating point values. The -A n option is also useful to avoid printing the file offsets. Here is the output of an example file that I created.

/tmp> od -A n -t f4 b.dump -999.876 -998.876 -997.876 -996.876 -995.876 -994.876 -993.876 -992.876 -991.876 -990.876 -989.876 -988.876 -987.876 -986.876 -985.876 -984.876 

You will need to reverse the process.

  1. Read the file back into an array of floats.
  2. Print the array use printf() or your favorite io function.

Any other approach will be ugly and painful; not to say this isn’t ugly to start with.

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.12.43529

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

Convert binary mode to text mode and the reverse option

What leads you to believe it’s still a text file? xxd -r -p is the exact reverse of the od conversion you did; the output of cuonglm’s command should be strictly identical to the original tarball.

Читайте также:  Find all processes by user linux

3 Answers 3

od -An -vtx1 Check.tar > Check.txt 

You need -v or od will condense sequences of identical bytes.

LC_ALL=C tr -cd 0-9a-fA-F < Check.txt | xxd -r -p >Check.tar 
perl -ape '$_=pack "(H2)*", @F' Check.txt > Check.tar 

If your purpose is to transfer files over a channel that only supports ASCII text, then there are dedicated tools for that like uuencode :

tar cf - myfiles.* | xz | uuencode myfiles.tar.xz | that-channel 

And to recover those files on the other end:

would recreate myfiles.tar.xz .

@maihabunash If you created file.txt without -v and with removing the address, then you can’t reliably recover file.tar if there were condensed sequences (do a grep ‘[*]’ file.txt to check) as you’ve lost the information of how long those condensed sequences were by removing the address.

hi , my target is to compress more then 30 perl script with tar or zip or whatever then convert it to text and then convert it back to compressed file , is it possible? ( I see tar is problem but can we do it with other options )

@maihabunash, you’re looking for uuencode or base64 encoding. Note that my answer covers your question. I give the code to convert back to binary from od output provided you don’t forget the -v option. If you’re transferring files over FTP, don’t forget to set the mode to «binary» ( TYPE I FTP command, something like binary in your client)

@pmor, It looks like you want the output of xxd -p which can be decoded with xxd -r -p . the tr | xxd -r -p approach already removes all whitepace (anything but xdigits). perl -pe ‘chomp;$_=pack»H*»,$_’ to decode the output of xxd -p

Answering the X part of this XY problem, I would recommend you investigate the reason your binary file transfers don’t transfer properly.

If it turns out the reason is because you don’t have an 8-bit clean datapath you could then use existing tools that were created to handle this situation, such as base64 or even uuencode . Old but still very effective.

tar czvf - /etc/h* | base64 >/tmp/tar.tgz.b64 ls -l /tmp/tar.tgz.b64 -rw-r--r-- 1 root root 7364 May 26 11:52 /tmp/tar.tgz.b64 . base64 -d /tmp/tar.tgz.b64 | tar tzvf - 
tar czvf - /etc/h* | uuencode - >/tmp/tar.tgz.uue ls -l /tmp/tar.tgz.uue -rw-r--r-- 1 root root 7530 May 26 11:51 /tmp/tar.tgz.uue . uudecode /tmp/tar.tgz.uue | tar xzvf - 

Источник

Читайте также:  Yandex disk linux настройка

Convert doc to txt via commandline

We’re searching a programm that allows us to convert a doc or docx document to a txt file. We’re working with linux and we want to start a website that converts user uploaded doc files. We don’t wanna use open office/libre office cause we have bad experience with that. Pandoc can’t handle doc files :/ Anyone have a idea?

4 Answers 4

You will have to use two different command-line tools, depending if you are working with .doc or .docx format.

The latter will produce a file called foo.txt in the same directory as the original.

I’m not sure which Linux distribution you are using, but both catdoc and docx2txt are available from the Ubuntu repositories, for example:

Thanks for the info, unfortunately for me brew install docx2txt didn’t work, ‘catdoc’ command is not available and I need to use ‘docx2txt.sh’ instead of ‘docx2txt’.

It turns out catdoc got delegated to the boneyard but one can build it from source, details here: apple.stackexchange.com/a/294259/36790

here is a perl project which claims to do it. I have done a lot of this by hand also, using XSLT on the document.xml. the Docx file itself is just a zip file, you can unzip it and inspect the elements. I will say that this is not hard to do for specific files, but is very hard to do in the general case, because of the lack of documentation for how Word internally stores things, and the variance of internal representation.

Источник

bash command to convert html page to a text file

I am a beginner to linux. Would you please help me how to convert an html page to a text file. the text file will remove any images and links from the webpage. I want to use only bash commands and not html to text converting tools. As an example, i want to convert the first page google search results for «computers». Thank you

Читайте также:  Настройка прав доступа пользователей linux

You are likely not going to be able to do it with only «bash commands», you are probably going to need at least sed or awk . Not saying it’s not possible to do with just plain bash builtins, but it certainly is not feasible.

If you’re using curl to get that page, you can pipe it to | lynx -stdin -dump . See unix.stackexchange.com/a/608964/43233

12 Answers 12

Easiest way is to use something like this which the dump (in short is the text version of viewable HTML).

lynx --dump www.google.com > file.txt links -dump www.google.com 
lynx --dump ./1.html > file.txt links -dump ./1.htm 

With charset conversion to utf8 (see):

lynx -dump -display_charset UTF-8 ./1.htm links -dump -codepage UTF-8 ./1.htm 

Reading the comment page: -dump: dumps the formatted output of the default document — I take it «formatted» means, with all the html tags.

lynx.browser.org/lynx_help/body.html take a look for «assumed document charset» fields via DISPLAY_CHARSET_CHOICE and ASSUMED_DOC_CHARSET_CHOICE as i say it may be related to your version, Try the alternative links. Here are some more exampes microhowto.info/howto/…

If you want to use w3m to read from stdin in a pipe you need to add -T text/html to set the MIME type.

You have html2text.py on command line.

Usage: html2text.py [(filename|url) [encoding]]

Options: --version show program's version number and exit -h, --help show this help message and exit --ignore-links don't include any formatting for links --ignore-images don't include any formatting for images -g, --google-doc convert an html-exported Google Document -d, --dash-unordered-list use a dash rather than a star for unordered list items -b BODY_WIDTH, --body-width=BODY_WIDTH number of characters per output line, 0 for no wrap -i LIST_INDENT, --google-list-indent=LIST_INDENT number of pixels Google indents nested lists -s, --hide-strikethrough hide strike-through text. only relevent when -g is specified as well 

Источник

Оцените статью
Adblock
detector