- Saved searches
- Use saved searches to filter your results more quickly
- License
- lambdalisue/txt2xls
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.rst
- How can I edit txt to xls in Unix?
- 1 Answer 1
- text to xls file creation
- 1 Answer 1
- Update:
- Convert many txt files to xls files with bash script
- 1 Answer 1
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Convert raw text data files into a single excel file
License
lambdalisue/txt2xls
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.rst
txt2xls convert raw text data files into a single excel file. It use maidenhair for reading raw text files so any kind of raw text file can be used if there is a maidenhair plugins.
Assume there are several raw text data files like:
# Sample1.txt 0 10 1 20 2 30 3 40 4 50 5 60 # Sample2.txt 0 15 1 25 2 35 3 45 4 55 5 65 # Sample3.txt 0 12 1 22 2 32 3 42 4 52 5 62
It will produce output.xls file. The excel file have Sample1 , Sample2 , and Sample3 sheets.
usage: txt2xls [-h] [-v] [-p PARSER] [-l LOADER] [-u USING] [--unite] [--unite-basecolumn UNITE_BASECOLUMN] [--unite-function UNITE_FUNCTION] [--classify] [--classify-function CLASSIFY_FUNCTION] [--relative] [--relative-origin RELATIVE_ORIGIN] [--relative-basecolumn RELATIVE_BASECOLUMN] [--baseline] [--baseline-basecolumn BASELINE_BASECOLUMN] [--baseline-function BASELINE_FUNCTION] [--peakset-method ] [--peakset-basecolumn PEAKSET_BASECOLUMN] [--peakset-where-function PEAKSET_WHERE_FUNCTION] [--raise-exception] [-o OUTFILE] infiles [infiles . ] positional arguments: infiles Path list of data files or directories which have data files. optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit --raise-exception If it is specified, raise exceptions. -o OUTFILE, --outfile OUTFILE An output filename without extensions. The required filename extension will be automatically determined from an output format. Reading options: -p PARSER, --parser PARSER A maidenhair parser name which will be used to parse the raw text data. -l LOADER, --loader LOADER A maidenhair loader name which will be used to load the raw text data. -u USING, --using USING A colon (:) separated column indexes. It is used for limiting the reading columns. Unite options: --unite Join the columns of classified dataset with respecting --unite-basecolumn.The dataset is classified with --unite-function. --unite-basecolumn UNITE_BASECOLUMN An index of columns which will be used as a base column for regulating data point region. --unite-function UNITE_FUNCTION A python script file path or a content of python lambda expression which will be used for classifing dataset. If it is not spcified, a filename character before period (.) will be used to classify. Classify options: --classify Classify dataset with --classify-function. It will influence the results of --relative and --baseline. --classify-function CLASSIFY_FUNCTION A python script file path or a content of python lambda expression which will be used for classifing dataset. If it is not specified, a filename character before the last underscore (_) will be used to classify. Relative options: --relative If it is True, the raw data will be converted to relative data from the specified origin, based on the specified column. See `--relative-origin` and `--relative-basecolumn` also. --relative-origin RELATIVE_ORIGIN A dataset number which will be used as an orign of the relative data. It is used with `--relative` option. --relative-basecolumn RELATIVE_BASECOLUMN A column number which will be used as a base column to make the data relative. It is used with `--relative` option. Baseline options: --baseline If it is specified, the specified data file is used as a baseline of the dataset. See `--baseline-basecolumn` and `--baseline-function` also. --baseline-basecolumn BASELINE_BASECOLUMN A column index which will be proceeded for baseline regulation. It is used with `--baseline` option. --baseline-function BASELINE_FUNCTION A python script file path or a content of python lambda expression which will be used to determine the baseline value from the data. `columns` and `column` variables are available in the lambda expression. Peakset options: --peakset-method A method to find peak data point. --peakset-basecolumn PEAKSET_BASECOLUMN A column index which will be used for finding peak data point. --peakset-where-function PEAKSET_WHERE_FUNCTION A python script file path or a content of python lambda expression which will be used to limit the range of data points for finding. peak data point. `data` is available in the lambda expression.
You can create configure file as ~/.config/txt2xls/txt2xls.cfg (Linux), ~/.txt2xls.cfg (Mac), or %APPDATA%\txt2xls\txt2xls.cfg (Windows).
The default preference is equal to the configure file as below:
[default] raise_exception = False [reader] parser = 'parsers.PlainParser' loader = 'loaders.PlainLoader' using = None [[classify]] enabled = False function = 'builtin:classify_function' [[unite]] enabled = False function = 'builtin:unite_function' basecolumn = 0 [[relative]] enabled = False origin = 0 basecolumn = 1 [[baseline]] enabled = False function = 'builtin:baseline_function' basecolumn = 1 [writer] default_filename = 'output.xls' [[peakset]] method = 'argmax' basecolumn = -1 where_function = 'builtin:where_function'
I don’t use Microsoft Windows so the location of the configure file in Windows might be wrong. Let me know if there are any mistakes.
How can I edit txt to xls in Unix?
Its printing datetime in first row but only printing tid in next column. Can anyone please help me to print the whole remaining text in second column Record in text file:
2019-11-26T11:51:32.087-08:00 tid: JCA-work-instance:AQ Adapter-8 userId: 0 , APP: Service Bus Logging FlowId: 0000MtDbHiu8pmk5Szd9ic1TlVox0015xl RouteNode2, null, null, REQUEST Queried data from header table
Nov 28, 2019 8:19:03 AM PST HTTP BEA-101019 [ServletContext[text] Servlet failed with an IOException. Nov 28, 2019 8:22:40 AM PST [null, null, null, ERROR] error in service-callouterror service to get information
1 Answer 1
The reason for the behaviour you see is that per default, awk treats WHITESPACE (i.e. space, tab) as input field separator. Thus, every item in your input file that is surrounded by space is treated as a single «field» and is assigned its own $ internal variable. Your awk command, however, instructs awk to print only the first two such fields ( $1 and $2 ), which are in your case the date/time string and the literal tid: .
In your particular case, the easiest way might be to use sed to replace the first white-space by a tabulator, which should give the desired result.
Since you also want to include a header line, the following should work (assuming GNU sed is being used):
sed -e '1 i\DateTime\tError' -e 's/ /\t/' TMP.txt > Output.txt
The first expression inserts one line of text at the beginning of the line, the second performs the «actual formatting» intended.
For the additional string format you provided, I would resort to awk instead of sed (note that I use GNU awk):
awk 'BEGIN [[:space:]]+[0123]?[[:digit:]],[[:space:]]+20[[:digit:]][[:space:]]+[01]?[[:digit:]]:[012345][[:digit:]]:[012345][[:digit:]][[:space:]]+[AP]M[[:space:]]+[[:alpha:]]+)[[:space:]]+([[:print:]]*)$", fields); printf("%s\t%s\n", fields[1], fields[2])>' TMP.txt > Output.txt
This regular expression matches a time format as specified by you, followed by one or more space, followed by arbitrary printable characters up to the end of the line, and prints the first (. ) sub-group, the timestamp, then a \t , and then the second (. ) sub-group, which is «the rest of the line». Additionally, the BEGIN anchor is used to insert the header line on top.
Since both cases can occur in the same file, we have to combine them into a single awk program:
BEGIN < printf("DateTime\tError\n"); >< if (match($0,"^([[:alpha:]][[:space:]]+[0123]?[[:digit:]],[[:space:]]+20[[:digit:]][[:space:]]+[012]?[[:digit:]](:[012345][[:digit:]])[[:space:]]+[AP]M[[:space:]]+[[:upper:]]+)[[:space:]]+([[:print:]]*)$", fields) == 0) match($0,"^(20[[:digit:]]-[01][[:digit:]]-[0123][[:digit:]][[:alpha:]][012][[:digit:]](:[012345][[:digit:]]).[[:digit:]][+-][012][[:digit:]]:[012345][[:digit:]])[[:space:]]+([[:print:]]*)$", fields); printf("%s\t%s\n", fields[1], fields[3]); >
You can call the above script xlsconvert.awk and then call it as
user@host$ awk -f xlsconvert.awk TMP.txt > Output.txt
Note that this will, of course, keep the different timestamp formats in the output. If you want to convert that to a unified format, you may have to resort to a shell script.
text to xls file creation
I have 4 text files in my unix system want to put data into a xls file and send it via mail. text1.txt:
so on . I want to have text1.txt file data in first column, text2.txt file data in second column and finally on the end want the summation of data in each row.
You don’t mention what you want done with the other two files. Also, given that most spreadsheets can import CSV files, is CSV sufficient or do you really need a .xls file? CSV is easy, .xls is much harder and requires using a language like perl or python which have libraries for reading and writing .xls files.
1 Answer 1
Step 1: join the files in a multicolumn file:
Step 2: add a formula to calculate the sum:
gawk '1;END print "">' _1.tsv > _2.tsv
Step 3: inspect the result
$ cat _2.tsv 100 100 100 . 99 99 99 =SUM(A1:A5) =SUM(B1:B5) =SUM(C1:C5) $ gnumeric _2.tsv # excel, whatever your favorite spreadsheet
Step 4: (probably not necessary) if you really want a .xls file:
ssconvert comes with gnumeric fantastic spreadsheet ( apt-get install gnumeric if necessary); Several conversions formats available.
ssconvert _2.tsv _2.pdf ssconvert _2.tsv _2.txt
As expected, these txt and pdf files have formulas replaced by their results.
Update:
to have a column with the line sum, Step 2 could be:
Convert many txt files to xls files with bash script
with this, I lost the header row and I also get a column with many Chinese signs, but the rest looks okay:
攀挀琀 | Max Muster | Bla102 | user 氀愀猀 | Heidi Held | Held100 | admin
The odd seq command looks like you are reinventing nl but on the whole, I guess awk -F ‘;’ ‘BEGIN < OFS="," >< print FNR, $0 >‘ «$file» would solve the problem more elegantly and efficiently (this also replaces the tr ).
It is unclear on what grounds exactly you want the Chinese to be discarded. Should we just discard the first column wholesale? Or replace non-ASCII text with nothing? Please clarify (you should edit your question to outline in more detail what you want).
Actually I do not totally understand your awk -command. It reads the input file and replaces the ; with , . But can I produce an Output? Like this: for file in *.txt; do awk -F ‘;’ ‘BEGIN < OFS="," >< print FNR, $0 >‘ «$file» > «$
1 Answer 1
The question unfortunately does not provide enough details to be sure what exactly the issues are; but we have identified in comments at least the following.
- Apparently, the input file contains DOS carriage returns.
- Apparently, soffice attempted to read the file as UTF-16, which is what produced the essentially random Chinese characters. (The characters could be anything; it’s just more probable that a random Unicode BMP character will be in a Chinese/Japanese block.)
With those observations and a refactoring of the existing script, try
for file in *.txt; do awk -F ';' 'BEGIN < OFS="," >FNR==1 < # Add UTF-8 BOM printf "\357\273\277" # Generate header line for soffice to discard for (i=1; i< sub(/\015/, ""); print FNR, $0 >' "$file" > "$.xls" soffice --headless --convert-to xls:"MS Excel 95" filename.xls "$.xls" done
In so many words, the Awk script splits each input line on semicolons ( -F ‘;’ ) and sets the output field separator OFS to a comma. On the first output line, we add a BOM and a synthetic header line for soffice to discard before the real output, so that the header line appears like a regular data line in the output. The sub takes care of removing any DOS carriage return character, and the variable FNR is the current input line’s line number.
I’m not sure if the BOM or the bogus header line are strictly necessary, or if perhaps you need to pass in some additional options to make soffice treat the input as proper UTF-8. Perhaps you also need to include LC_ALL=C somewhere in the pipeline.