Linux big text file

How can I split a large text file into smaller files with an equal number of lines?

I’ve got a large (by number of lines) plain text file that I’d like to split into smaller files, also by number of lines. So if my file has around 2M lines, I’d like to split it up into 10 files that contain 200k lines, or 100 files that contain 20k lines (plus one file with the remainder; being evenly divisible doesn’t matter). I could do this fairly easily in Python, but I’m wondering if there’s any kind of ninja way to do this using Bash and Unix utilities (as opposed to manually looping and counting / partitioning lines).

Out of curiousity, after they’re «split», how does one «combine» them? Something like «cat part2 >> part1»? Or is there another ninja utility? mind updating your question?

yes cat is short for concatenate. In general apropos is useful for finding appropriate commands. I.E. see the output of: apropos split

As an aside, OS X users should make sure their file contains LINUX or UNIX-style Line breaks/End-Of-Line indicators (LF) instead of MAC OS X — style end-of-line indicators (CR) — the split and csplit commands will not work if your like breaks are Carriage Returns instead of LineFeeds. TextWrangler from BareBones software can help you with this if you’re on Mac OS. You can choose how you want your line break characters look. when you save (or Save As. ) your text files.

12 Answers 12

Have a look at the split command:

$ split --help Usage: split [OPTION] [INPUT [PREFIX]] Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, . ; default size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N use suffixes of length N (default 2) -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file -d, --numeric-suffixes use numeric suffixes instead of alphabetic -l, --lines=NUMBER put NUMBER lines per output file --verbose print a diagnostic to standard error just before each output file is opened --help display this help and exit --version output version information and exit 

You could do something like this:

Читайте также:  Football manager 2012 linux

which will create files each with 200000 lines named xaa xab xac .

Another option, split by size of output file (still splits on line breaks):

 split -C 20m --numeric-suffixes input_filename output_prefix 

creates files like output_prefix01 output_prefix02 output_prefix03 . each of maximum size 20 megabytes.

you can also split a file by size: split -b 200m filename (m for megabytes, k for kilobytes or no suffix for bytes)

split produces garbled output with Unicode (UTF-16) input. At least on Windows with the version I have.

@geotheory, be sure to follow LeberMac’s advice earlier in the thread about first converting CR (Mac) line endings to LR (Linux) line endings using TextWrangler or BBEdit. I had the exact same problem as you until I found that piece of advice.

split -l 200000 mybigfile.txt 

And can we set the maximum number of outputs? for example split that big file but don’t exceed 50 output; even if there are remained lines in the big file

Yes, there is a split command. It will split a file by lines or bytes.

$ split --help Usage: split [OPTION]. [INPUT [PREFIX]] Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, . ; default size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N use suffixes of length N (default 2) -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file -d, --numeric-suffixes use numeric suffixes instead of alphabetic -l, --lines=NUMBER put NUMBER lines per output file --verbose print a diagnostic just before each output file is opened --help display this help and exit --version output version information and exit SIZE may have a multiplier suffix: b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024, GB 1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y. 

Источник

How to view huge txt files in linux?

There are several ways to view large text files in Linux. Here are three common methods:

Method 1: Using the «less» command

  • Step 1 — Open the terminal.
  • Step 2 — Navigate to the directory where the large text file is located
Читайте также:  Assassins creed origins linux

For example, if the file is located in the Documents folder, you would type «cd Documents» (without the quotes) and press Enter.

For example, if the file is called «largefile.txt», you would type «less largefile.txt» (without the quotes) and press Enter.

You can also use the «Page Up» and «Page Down» keys to move up and down the file. To search for a specific word or phrase, press the «/» key and enter the word or phrase you’re looking for. Press «n» to go to the next occurrence of the word or phrase.

The «less» command is a very powerful tool for viewing large text files. It allows you to scroll through the file, search for specific words or phrases, and navigate to specific lines.

Method 2: Using the «head» and «tail» commands

  • Step 1 — Open the terminal.
  • Step 2 — Navigate to the directory where the large text file is located

For example, if the file is located in the Documents folder, you would type «cd Documents» (without the quotes) and press Enter.

For example, if you want to view the first 10 lines of the file «largefile.txt», you would type «head -n 10 largefile.txt» (without the quotes) and press Enter.

For example, if you want to view the last 10 lines of the file «largefile.txt», you would type «tail -n 10 largefile.txt» (without the quotes) and press Enter.

The «head» and «tail» commands allow you to view the first and last few lines of a large text file, respectively. This can be useful when you’re only interested in certain parts of the file and don’t want to scroll through the entire file.

Method 3: Using «cat» command

  • Step 1 — Open the terminal.
  • Step 2 — Navigate to the directory where the large text file is located

For example, if the file is located in the Documents folder, you would type «cd Documents» (without the quotes) and press Enter.

For example, if the file is called «largefile.txt», you would type «cat largefile.txt» (without the quotes) and press Enter.

  • Step 4 — Use the arrow keys to scroll through the file.
  • Step 5 — To exit, close the terminal

The «cat» command is used to concatenate and display the contents of files in the terminal. However, it may not be suitable for large text files as it will print the entire content of the file in the terminal, which might not fit into the screen.

Читайте также:  Clear ip address linux

common methods for viewing large text files in Linux. Each method has its own advantages and disadvantages, and the best method for you will depend on your specific needs. The «less» command is a powerful tool that allows you to scroll through the file, search for specific words or phrases, and navigate to specific lines. The «head» and «tail» commands allow you to view the first and last few lines of a large text file, respectively, which can be useful when you’re only interested in certain parts of the file. The «cat» command will display the entire content of the file in the terminal, but it may not be suitable for large text files as it may not fit in the screen.

It is important to note that, when working with large text files, it is always a good practice to make a backup of the original file before attempting to view or edit it. This will ensure that you can always revert back to the original file in case something goes wrong during the viewing or editing process.

In addition to the above methods, there are other command line tools like ‘split’ that can be used to split large files into smaller chunks, ‘grep’ to search for specific patterns, ‘sed’ for editing and ‘awk’ for processing large files.

Conclusion

In conclusion, there are several methods for viewing large text files in Linux, including using the «less», «head», «tail» and «cat» commands. Each method has its own advantages and disadvantages, and the best method for you will depend on your specific needs. Whether you need to scroll through the file, search for specific words or phrases, or view the first or last few lines of the file, there is a command that can help you do that. It is important to note that when working with large text files, it is always a good practice to make a backup of the original file before attempting to view or edit it, and there are other command line tools like ‘split’, ‘grep’, ‘sed’ and ‘awk’ that can be used to split, search and process large files.

Источник

Оцените статью
Adblock
detector