generate a random file using shell script
How can i generate a random file filled with random number or character in shell script? I also want to specify size of the file.
6 Answers 6
Use dd command to read data from /dev/random.
dd if=/dev/random of=random.dat bs=1000000 count=5000
That would read 5000 1MB blocks of random data, that is a whole 5 gigabytes of random data!
Experiment with blocksize argument to get the optimal performance.
After a second read of the question, i think he also wanted to save only characters (guessing alphabetic ones) and numbers to the file.
That dd command is unlikely to complete as there will not be 5 gigabytes of entropy available. Use /dev/urandom if you need this much «randomness».
head -c 10 /dev/random > rand.txt
change 10 to whatever. Read «man random» for differences between /dev/random and /dev/urandom.
Or, for only base64 characters
head -c 10 /dev/random | base64 | head -c 10 > rand.txt
The base64 might include some characters you’re not interested in, but didn’t have time to come up with a better single-liner character converter. (also we’re taking too many bytes from /dev/random. sorry, entropy pool!)
oops, missed the characters and numbers part, i’m guessing you mean alphanumeric characters. need to revise.
#!/bin/bash # Created by Ben Okopnik on Wed Jul 16 18:04:33 EDT 2008 ######## User settings ############ MAXDIRS=5 MAXDEPTH=2 MAXFILES=10 MAXSIZE=1000 ######## End of user settings ############ # How deep in the file system are we now? TOP=`pwd|tr -cd '/'|wc -c` populate() < cd $1 curdir=$PWD files=$(($RANDOM*$MAXFILES/32767)) for n in `seq $files` do f=`mktemp XXXXXX` size=$(($RANDOM*$MAXSIZE/32767)) head -c $size /dev/urandom >$f done depth=`pwd|tr -cd '/'|wc -c` if [ $(($depth-$TOP)) -ge $MAXDEPTH ] then return fi unset dirlist dirs=$(($RANDOM*$MAXDIRS/32767)) for n in `seq $dirs` do d=`mktemp -d XXXXXX` dirlist="$dirlist$$PWD/$d" done for dir in $dirlist do populate "$dir" done > populate $PWD
Create 100 randomly named files of 50MB in size each:
for i in `seq 1 100`; do echo $i; dd if=/dev/urandom bs=1024 count=50000 > `echo $RANDOM`; done
It’s better to use mktemp to create random files. for i in seq 1 100; do myfile= mktemp —tmpdir=. dd if=/dev/urandom bs=1024 count=50000 > $myfile done
The RANDOM variable will give you a different number each time:
Save as «script.sh», run as ./script.sh SIZE. The printf code was lifted from http://mywiki.wooledge.org/BashFAQ/071. Of course, you could initialize the mychars array with brute force, mychars=(«0» «1» . «A» . «Z» «a» . «z»), but that wouldn’t be any fun, would it?
#!/bin/bash declare -a mychars for (( I=0; I0; I-- )); do echo -n $ done echo
The /dev/random & base64 approach is also good, instead of piping through base64, pipe through «tr -d -c [:alnum:]», then you just need to count the good chars that come out until you’re done.
Generating a random binary file
Why did it take 5 minutes to generate a 1 KiB file on my (low-end laptop) system with little load? And how could I generate a random binary file faster?
$ time dd if=/dev/random of=random-file bs=1 count=1024 1024+0 records in 1024+0 records out 1024 bytes (1.0 kB) copied, 303.266 s, 0.0 kB/s real 5m3.282s user 0m0.000s sys 0m0.004s $
Notice that dd if=/dev/random of=random-file bs=1024 count=1 doesn’t work. It generates a random binary file of random length, on most runs under 50 B. Has anyone an explanation for this too?
5 Answers 5
That’s because on most systems /dev/random uses random data from the environment, such as static from peripheral devices. The pool of truly random data (entropy) which it uses is very limited. Until more data is available, output blocks.
Retry your test with /dev/urandom (notice the u ), and you’ll see a significant speedup.
See Wikipedia for more info. /dev/random does not always output truly random data, but clearly on your system it does.
$ time dd if=/dev/urandom of=/dev/null bs=1 count=1024 1024+0 records in 1024+0 records out 1024 bytes (1.0 kB) copied, 0.00675739 s, 152 kB/s real 0m0.011s user 0m0.000s sys 0m0.012s
$ time dd if=/dev/urandom of=random-file bs=1 count=1024
The main difference between random and urandom is how they are pulling random data from kernel. random always takes data from entropy pool. If the pool is empty, random will block the operation until the pool would be filled enough. urandom will genarate data using SHA(or any other algorithm, MD5 sometimes) algorithm in the case kernel entropy pool is empty. urandom will never block the operation.
I wrote a script to test various hashing functions speeds. For this I wanted files of «random» data, and I didn’t want to use the same file twice so that none of the functions had a kernel cache advantage over the other. I found that both /dev/random and /dev/urandom were painfully slow. I chose to use dd to copy data of my hard disk starting at random offsets. I would NEVER suggest using this if you are doing anythings security related, but if all you need is noise it doesn’t matter where you get it. On a Mac use something like /dev/disk0 on Linux use /dev/sda
Here is the complete test script:
tests=3 kilobytes=102400 commands=(md5 shasum) count=0 test_num=0 time_file=/tmp/time.out file_base=/tmp/rand while [[ test_num -lt tests ]]; do ((test_num++)) for cmd in "$"; do ((count++)) file=$file_base$count touch $file # slowest #/usr/bin/time dd if=/dev/random of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file # slow #/usr/bin/time dd if=/dev/urandom of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file # less slow /usr/bin/time sudo dd if=/dev/disk0 skip=$(($RANDOM*4096)) of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file echo "dd took $(tail -n1 $time_file | awk '') seconds" echo -n "$(printf "%7s" $cmd)ing $file: " /usr/bin/time $cmd $file >/dev/null rm $file done done
Here is the «less slow» /dev/disk0 results:
dd took 6.49 seconds md5ing /tmp/rand1: 0.45 real 0.29 user 0.15 sys dd took 7.42 seconds shasuming /tmp/rand2: 0.93 real 0.48 user 0.10 sys dd took 6.82 seconds md5ing /tmp/rand3: 0.45 real 0.29 user 0.15 sys dd took 7.05 seconds shasuming /tmp/rand4: 0.93 real 0.48 user 0.10 sys dd took 6.53 seconds md5ing /tmp/rand5: 0.45 real 0.29 user 0.15 sys dd took 7.70 seconds shasuming /tmp/rand6: 0.92 real 0.49 user 0.10 sys
Here are the «slow» /dev/urandom results:
dd took 12.80 seconds md5ing /tmp/rand1: 0.45 real 0.29 user 0.15 sys dd took 13.00 seconds shasuming /tmp/rand2: 0.58 real 0.48 user 0.09 sys dd took 12.86 seconds md5ing /tmp/rand3: 0.45 real 0.29 user 0.15 sys dd took 13.18 seconds shasuming /tmp/rand4: 0.59 real 0.48 user 0.10 sys dd took 12.87 seconds md5ing /tmp/rand5: 0.45 real 0.29 user 0.15 sys dd took 13.47 seconds shasuming /tmp/rand6: 0.58 real 0.48 user 0.09 sys
Here is are the «slowest» /dev/random results:
dd took 13.07 seconds md5ing /tmp/rand1: 0.47 real 0.29 user 0.15 sys dd took 13.03 seconds shasuming /tmp/rand2: 0.70 real 0.49 user 0.10 sys dd took 13.12 seconds md5ing /tmp/rand3: 0.47 real 0.29 user 0.15 sys dd took 13.19 seconds shasuming /tmp/rand4: 0.59 real 0.48 user 0.10 sys dd took 12.96 seconds md5ing /tmp/rand5: 0.45 real 0.29 user 0.15 sys dd took 12.84 seconds shasuming /tmp/rand6: 0.59 real 0.48 user 0.09 sys
You’ll notice that /dev/random and /dev/urandom were not much different in speed. However, /dev/disk0 took 1/2 the time.
PS. I lessen the number of tests and removed all but 2 commands for the sake of «brevity» (not that I succeeded in being brief).
Is there a command to write random garbage bytes into a file?
The /dev/urandom pseudo-device, along with dd , can do this for you:
dd if=/dev/urandom of=newfile bs=1M count=10
This will create a file newfile of size 10M.
The /dev/random device will often block if there is not sufficient randomness built up, urandom will not block. If you’re using the randomness for crypto-grade stuff, you can steer clear of urandom . For anything else, it should be sufficient and most likely faster.
If you want to corrupt just bits of your file (not the whole file), you can simply use the C-style random functions. Just use rnd() to figure out an offset and length n , then use it n times to grab random bytes to overwrite your file with.
The following Perl script shows how this can be done (without having to worry about compiling C code):
use strict; use warnings; sub corrupt ($$$$) < # Get parameters, names should be self-explanatory. my $filespec = shift; my $mincount = shift; my $maxcount = shift; my $charset = shift; # Work out position and size of corruption. my @fstat = stat ($filespec); my $size = $fstat[7]; my $count = $mincount + int (rand ($maxcount + 1 - $mincount)); my $pos = 0; if ($count >= $size) < $count = $size; >else < $pos = int (rand ($size - $count)); ># Output for debugging purposes. my $last = $pos + $count - 1; print "'$filespec', $size bytes, corrupting $pos through $last\n";
# Open file, seek to position, corrupt and close. open (my $fh, "+ <$filespec") || die "Can't open $filespec: $!"; seek ($fh, $pos, 0); while ($count-- >0) < my $newval = substr ($charset, int (rand (length ($charset) + 1)), 1); print $fh $newval; >close ($fh); > # Test harness. system ("echo =========="); #DEBUG system ("cp base-testfile testfile"); #DEBUG system ("cat testfile"); #DEBUG system ("echo =========="); #DEBUG corrupt ("testfile", 8, 16, "ABCDEFGHIJKLMNOPQRSTUVWXYZ "); system ("echo =========="); #DEBUG system ("cat testfile"); #DEBUG system ("echo ========= mt24">)" data-controller="se-share-sheet" data-se-share-sheet-title="Share a link to this answer" data-se-share-sheet-subtitle="" data-se-share-sheet-post-type="answer" data-se-share-sheet-social="facebook twitter devto" data-se-share-sheet-location="2" data-se-share-sheet-license-url="https%3a%2f%2fcreativecommons.org%2flicenses%2fby-sa%2f3.0%2f" data-se-share-sheet-license-name="CC BY-SA 3.0" data-s-popover-placement="bottom-start">Share)" title="">Improve this answer)">edited Dec 12, 2014 at 14:14 Tim7,6603 gold badges30 silver badges33 bronze badgesanswered Aug 30, 2010 at 7:36 paxdiablopaxdiablo851k233 gold badges1570 silver badges1945 bronze badges