Sum all file sizes
I’m new to bash and I need to do a little script to sum all file sizes, excluding subdirectories. My first idea was to keep the columns when you do ls -l . I cannot use grep, du or other advanced commands I’ve seen around here. $9 corresponds to the 9th column where the name is shown. $5 is the size of the file.
why can’t you use du ? is this homework? (homework is OK here, but it’s best to mention that it is because most people here prefer to give hints and point you in the right direction rather than give a complete answer if it’s for homework).
Column 9 is not the filename if the filename contains whitespace, as is allowed and fairly common. In addition to the badness of using ls , tests like -f name work in the standard-and-traditional test-aka-[ utility, the similar ksh-bash-etc-only [[ builtin, and perl, but not awk.
5 Answers 5
find . -maxdepth 1 -type f -printf "%s\n" | awk ' END'
Output is file size in bytes. The final statement is print sum+0 rather than just print sum to handle the case where there are no files (i.e., to correctly print 0 in that case). This is an alternative to doing BEGIN .
If all that’s needed is the total, do:
If you’re looking for a bash-centric, shell-script way to do it, here’s a shell loop that gathers all of the files (dot-files included), then uses the GNU coreutils stat utility to print their size into a summation variable.
shopt -s dotglob sum=0 for f in * do [[ -f "$f" && ! -h "$f" ]] || continue sum=$(( sum + $(stat -c "%s" "$f") )) done echo $sum
Bash considers symlinks to be «regular files», so we must skip them with the -h test.
We can use below command too
find . -maxdepth 1 -type f -exec ls -ltr <> \;| awk 'BEGIN END '
What benefit is there to doing a -tr in the ls when all you’re doing is counting bytes one file at a time? It is indeed dangerous to parse ls, although you should be safe(r) counting only the file size.
As it was a very basic exercise, teacher demanded the exercises using basic commands that requiere a little bit more development and can be replaced by more powerful commands like find or stat later on. But I got the answer and it was this:
dir=$1 if [ ! -d $dir ] then exit 1 else sum=0 cd $dir (ls -l $dir) > fitxers.txt C=($(awk '' fitxers.txt)) len=$ i=0 while [ $i -lt $len ] do for element in $(ls $dir) do if [ -f $element ] then let "sum = $sum + $" fi (( i++ )) done done echo $sum rm -r fitxers.txt exit 0 fi
Hope it’s a little bit helpfull for other beginners.
Sum total bytes of files
If I have files a , b and c in a directory on a Linux machine. How can I get the total number of bytes of these 3 files in a way that does not depend on how e.g. ls shows the information? I mean I am interested in a way that is not error prone Update
1) I am interested in binary files not ascii files
2) It would be ideal to be a portable solution e.g. GNU linux or Mac working
What are the errors that you’re trying to avoid? Are you OK with double-counting hard links? How about symlinks? And, since it’s unclear from your post, are you looking for the size of the file’s contents, or the amount of disk space they consume (ie, «test» is 4 bytes but might consume 4k or more depending on disk format).
You changed the question, adding a restriction about «binary files». Is this a relevant restriction really since you are picking explicit file names? If so, what’s your definition of a «binary file»?
@Kusalananda: My bad, I didn’t post it properly I am sorry. Binary file has binary data. Not sure if it is relevant since e.g. cat to all the files wont work
@Jim cat works on binary data, no problem. Utilities that interpret the data as text won’t work though.
7 Answers 7
Use du with the -c (print total) and -b (bytes) options:
$ ls -l total 12 -rw-r--r-- 1 terdon terdon 6 Sep 29 17:36 a.txt -rw-r--r-- 1 terdon terdon 12 Sep 29 17:38 b.txt -rw-r--r-- 1 terdon terdon 17 Sep 29 17:38 c.txt
$ du -bc a.txt b.txt c.txt 6 a.txt 12 b.txt 17 c.txt 35 total
And if you just want the total size in a variable:
$ var=$( du -bc a.txt b.txt c.txt | tail -n1 | cut -f1) $ echo $var 35
@Jim it’s the space the file(s) use on the disk which depends on the filesystem block size. For example, consider printf ‘1234’ > file . That creates a file with 4 bytes ( wc -c file ). On a system with a 4KiB block size (which is probably what you have), that will use 1 4KiB block on the file system. Now, look at printf ‘123’ >file . wc -c file reports 3, du -b file also shows 3, but du file shows 4 since that is the size of the file on disk since the smallest unit of size for the file system is 4. But this really should be another question.
I did the test indeed I see printed 4 but what is the 4? bytes? Also how do I see the 1 4kiB block used? ls also shows 3
$ stat --printf '%s\n' some individual files here | awk ' < s += $1 >END < print s >'
stat with the given —printf format (on Linux) will output the file sizes of the given files. The awk code then sums these up and reports the grand total.
$ stat -f '%z' some individual files here | awk ' < s += $1 >END < print s >'
The stat utility is non-portable, but you may wrap it in a portability shell script (or shell function):
#!/bin/sh case $(uname) in Linux) stat --printf '%s\n' "$@" ;; Darwin|*BSD) stat -f '%z' "$@" ;; *) echo 'Unknown system. I do not know how stat works here' >&2 exit 1 ;; esac | awk ' < s += $1 >END < print s >'
where a , b and c are the files whose size in bytes you’d like to add up.
Another solution would be to install GNU coreutils on the macOS system to get access to the same stat implementation as on Linux.
On Linux, you’d be also be able to do
$ du -bcl some individual files here | awk 'END < print $1 >'
but there’s no equivalent to this on macOS or the BSD systems (the -b flag is not implemented) unless GNU coreutils is installed.
Also note that if any of those files are of type directory, the size of all files in the directory tree underneath will be added.
@Jim I mean that it’s implemented by the stat utility on Linux. The stat utility on macOS (or BSD) does not have this flag. It’s a Linux-specific command line flag. But you said you ran on Linux, so I did not give a macOS solution.
Most systems where uname returns Linux will have the busybox implementation of stat (or the Android equivalent), not the GNU one. stat -c %s works with both busybox and GNU stat . It may be better to identify the stat implementation rather than the OS.
With GNU find , you can do:
find a.txt b.txt c.txt -prune -printf '%s\n' | paste -sd + - | bc
That gives the size as reported by ls -l or the stat() system call. For non-regular file types (like fifo, device, symlink), depending on the system, that may not necessarily give you the number of bytes that would be read from them if they were. See there for more options for those.
for that, but that’s not something you’d want to do for fifos or some device files like /dev/zero or /dev/random .
You can add the -L option to the find command to resolve symlinks and get the size of the target instead.
POSIXly, the only command that can get you the file size as returned by the lstat() system call is ls unfortunately.
ls -l doesn’t return the size for block devices. It is very difficult to parse its output reliably, and can only be done in a foolproof way (for compliant implementations and for non-device files) for one file at a time:
(here assuming a size of 0 for device files which is always true on Linux, but not on all systems).
sum=0 for file in a b c; do sum=$((sum + $(getsize "$file"))) done echo "$sum"
Get total size of a list of files in UNIX
I want to run a find command that will find a certain list of files and then iterate through that list of files to run some operations. I also want to find the total size of all the files in that list. I’d like to make the list of files FIRST, then do the other operations. Is there an easy way I can report just the total size of all the files in the list? In essence I am trying to find a one-liner for the ‘total_size’ variable in the code snippet below:
#!/bin/bash loc_to_look='/foo/bar/location' file_list=$(find $loc_to_look -type f -name "*.dat" -size +100M) total_size=. echo 'total size of all files is: '$total_size for file in $file_list; do # do a bunch of operations done
@fedorqui If her version of find supports -printf . Some form of -exec stat -f ‘%z’ <> \; (depending on your system’s implementation of stat ) would work as well.
Storing a list of file names in a flat string is not recommended anyway, since you can’t cope with file names containing whitespace easily.
6 Answers 6
You should simply be able to pass $file_list to du :
du -ch $file_list | tail -1 | cut -f 1
du will print an entry for each file, followed by the total (with -c ), so we use tail -1 to trim to only the last line and cut -f 1 to trim that line to only the first column.
This is a nice answer, but please remember that du prints actual disk usage rounded up to a multiple of (usually) 4 KB instead of logical file size. for i in <0..9>; do echo -n $i > $i.txt; done; du -ch *.txt => 40K total instead of 10 total .0..9>
You’re right, although in this case the ±4KB difference becomes negligible as we’re only dealing with files over 100MB. 😉
Also, check your version of du . If you are using GNU tools, you might be able to add —apparent-size to the du options to get the same size listing that ls displays.
What if the files name in the $files_list contain space? I’ve tried escaping space with \ but no luck.
Methods explained here have hidden bug. When file list is long, then it exceeds limit of shell comand size. Better use this one using du:
find -print0 | du --files0-from=- --total -s|tail -1
find produces null ended file list, du takes it from stdin and counts. this is independent of shell command size limit. Of course, you can add to du some switches to get logical file size, because by default du told you how physical much space files will take.
But I think it is not question for programmers, but for unix admins 🙂 then for stackoverflow this is out of topic.
We can remove -print0 and —files0-from=- and replace them with a call to xargs , e.g. find . | xargs du .
with xargs it calls du command many times, then it will be much slower, and will higher utilizes resources. otherwise, it will not prevent properly run, when any file name will contain any new line character. then find with xargs should have ‘null end’ option, for xargs it will be —null or -0 (numerical zero) switch.
Is there any way to do this without -print0 ? I have files.txt all over my file system that I generate with newline instead of null termination. Usually I use xargs -d’\n’ but I can’t get du to sum the sizes which is frustrating.
This code adds up all the bytes from the trusty ls for all files (it excludes all directories. apparently they’re 8kb per folder/directory)
cd /; find -type f -exec ls -s \; | awk ' END '
Note: Execute as root. Result in megabytes.
The problem with du is that it adds up the size of the directory nodes as well. It is an issue when you want to sum up only the file sizes. (Btw., I feel strange that du has no option for ignoring the directories.)
In order to add the size of files under the current directory (recursively), I use the following command:
ls -laUR | grep -e "^\-" | tr -s " " | cut -d " " -f5 | awk ' END '
How it works: it lists all the files recursively ( «R» ), including the hidden files ( «a» ) showing their file size ( «l» ) and without ordering them ( «U» ). (This can be a thing when you have many files in the directories.) Then, we keep only the lines that start with «-» (these are the regular files, so we ignore directories and other stuffs). Then we merge the subsequent spaces into one so that the lines of the tabular aligned output of ls becomes a single-space-separated list of fields in each line. Then we cut the 5th field of each line, which stores the file size. The awk script sums these values up into the sum variable and prints the results.