How to recursively find the latest modified file in a directory?
For a huge tree, it might be hard for sort to keep everything in memory.
%T@ gives you the modification time like a unix timestamp, sort -n sorts numerically, tail -1 takes the last line (highest timestamp), cut -f2 -d» » cuts away the first field (the timestamp) from the output.
Edit: Just as -printf is probably GNU-only, ajreals usage of stat -c is too. Although it is possible to do the same on BSD, the options for formatting is different ( -f «%m %N» it would seem)
And I missed the part of plural; if you want more then the latest file, just bump up the tail argument.
If order matters, you can switch use sort -rn | head -3 instead of sort -n | tail -3 . One version gives the files from oldest to newest, while the other goes from newest to oldest.
I had a huge directory (some ten thousands small files) and I was worried about the performance, but. this command run in less than one second! Great, many thanks. 🙂
«For a huge tree, it might be hard for sort to keep everything in memory.» sort will create temporary files (in /tmp ) as needed, so I don’t think that’s a concern.
I find the following shorter and with more interpretable output: find . -type f -printf ‘%TF %TT %p\n’ | sort | tail -1
If you know that files were last changed in the past i.e. week, the option -mtime -7 could be added to find to greatly speed up the process.
Following up on @plundra’s answer, here’s the BSD and OS X version:
find . -type f -print0 \ | xargs -0 stat -f "%m %N" \ | sort -rn | head -1 | cut -f2- -d" "
Does the BSD / OS X find support + instead of \; ? Because that does the same thing (passing multiple files as parameters), without the -print0 | xargs -0 pipe.
Just because you started to care about newlines in filenames with -print0 it won’t prevent to break it when you sort it line by line.
Instead of sorting the results and keeping only the last modified ones, you could use awk to print only the one with greatest modification time (in unix time):
find . -type f -printf "%T@\0%p\0" | awk ' < if ($0>max) < max=$0; getline mostrecent >else getline > END' RS='\0'
This should be a faster way to solve your problem if the number of files is big enough.
I have used the NUL character (i.e. ‘\0’) because, theoretically, a filename may contain any character (including space and newline) but that.
If you don’t have such pathological filenames in your system you can use the newline character as well:
find . -type f -printf "%T@\n%p\n" | awk ' < if ($0>max) < max=$0; getline mostrecent >else getline > END' RS='\n'
In addition, this works in mawk too.
I don’t know if this is because I’m on OSX, but this has a bunch of issues. 1. $0 is the entire line, not the first field (should be $1). 2. You shouldn’t use getline because that will skip lines. 3. You need the -0 flag for find in your first command to use the ‘\0’ delimeter.
@HarrisonMc I think it’s supposed to skip lines — each result outputs two lines, and awk compares the first line and saves the second. The less pathological case could be simplified to just output a single line and have awk spit out the «greatest» one. find . -type f -printf «%TF %TT\t%p\n» | awk ‘
Shows the latest file with human readable timestamp:
find . -type f -printf '%TY-%Tm-%Td %TH:%TM: %Tz %p\n'| sort -n | tail -n1
2015-10-06 11:30: +0200 ./foo/bar.txt
To show more files, replace -n1 with a higher number
This seems to work fine, even with subdirectories:
find . -type f | xargs ls -ltr | tail -n 1
In case of too many files, refine the find.
In case there are spaces in the file paths better do: find . -type f -print0 | xargs -0 ls -ltr | tail -n 1
I had the trouble to find the last modified file under Solaris 10. There find does not have the printf option and stat is not available. I discovered the following solution which works well for me:
find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '< print $6," ",$7 >' | sort | tail -1
To show the filename as well use
find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '< print $6," ",$7," ",$9 >' | sort | tail -1
Explanation
- find . -type f finds and lists all files
- sed ‘s/.*/»&»/’ wraps the pathname in quotes to handle whitespaces
- xargs ls -E sends the quoted path to ls , the -E option makes sure that a full timestamp (format year-month-day hour-minute-seconds-nanoseconds) is returned
- awk ‘< print $6," ",$7 >‘ extracts only date and time
- awk ‘< print $6," ",$7," ",$9 >‘ extracts date, time and filename
- sort returns the files sorted by date
- tail -1 returns only the last modified file
I use something similar all the time, as well as the top-k list of most recently modified files. For large directory trees, it can be much faster to avoid sorting. In the case of just top-1 most recently modified file:
find . -type f -printf '%T@ %p\n' | perl -ne '@a=split(/\s+/, $_, 2); ($t,$f)=@a if $a[0]>$t; print $f if eof()'
On a directory containing 1.7 million files, I get the most recent one in 3.4s, a speed-up of 7.5x against the 25.5s solution using sort.
Very cool: I just exchanged the last print with: system(«ls -l $f») if eof() to see the date in an nice way, too.
@MartinT. : great, and you’re welcome. It’s strange to me that people have this instinct to sort things ( O(n log n) ) when an O(n) method is available. This seems to be the only answer avoiding sort. BTW, the goal of the command I suggested is just to find the path of the latest file. You could alias the command in your shell (e.g. as lastfile ) and then you can do whatever you like with the result, such as ls -l $(lastfile .) , or open $(lastfile .) (on a Mac), etc.
find . -type f -ls 2>/dev/null | sort -M -k8,10 | head -n5
Reverse the order by placing a ‘-r’ in the sort command. If you only want filenames, insert «awk » |» before ‘| head’
I find the following shorter and with more interpretable output:
find . -type f -printf '%TF %TT %p\n' | sort | tail -1
Given the fixed length of the standardised ISO format datetimes, lexicographical sorting is fine and we don’t need the -n option on the sort.
If you want to remove the timestamps again, you can use:
find . -type f -printf '%TFT%TT %p\n' | sort | tail -1 | cut -f2- -d' '
Using find — with nice & fast time stamp
Here is how to find and list the latest modified files in a directory with subdirectories. Hidden files are ignored on purpose. The time format can be customised.
$ find . -type f -not -path '*/\.*' -printf '%TY-%Tm-%Td %TH:%TM %Ta %p\n' |sort -nr |head -n 10
Result
Handles spaces in file names perfectly well — not that these should be used!
2017-01-25 18:23 Wed ./indenting/Shifting blocks visually.mht 2016-12-11 12:33 Sun ./tabs/Converting tabs to spaces.mht 2016-12-02 01:46 Fri ./advocacy/2016.Vim or Emacs - Which text editor do you prefer?.mht 2016-11-09 17:05 Wed ./Word count - Vim Tips Wiki.mht
More
More find galore following the link.
@Michael On GNU/Linux and other Unix* systems, it is generally recommended not to use spaces in file names or directory names.
On Ubuntu 13, the following does it, maybe a tad faster, as it reverses the sort and uses ‘head’ instead of ‘tail’, reducing the work. To show the 11 newest files in a tree:
find . -type f -printf ‘%T@ %p\n’ | sort -n -r | head -11 | cut -f2- -d» » | sed -e ‘s,^./,,’ | xargs ls -U -l
This gives a complete ls listing without re-sorting and omits the annoying ‘./’ that ‘find’ puts on every file name.
treecent () < local numl if [[ 0 -eq $# ]] ; then numl=11 # Or whatever default you want. else numl=$1 fi find . -type f -printf '%T@ %p\n' | sort -n -r | head -$| cut -f2- -d" " | sed -e 's,^\./,,' | xargs ls -U -l >
Still, most of the work was done by plundra’s original solution. Thanks plundra.
I faced the same issue. I need to find the most recent file recursively. find took around 50 minutes to find.
Here is a little script to do it faster:
#!/bin/sh CURRENT_DIR='.' zob () < FILE=$(ls -Art1 $| tail -n 1) if [ ! -f $ ]; then CURRENT_DIR="$/$" zob fi echo $FILE exit > zob
It’s a recursive function who get the most recent modified item of a directory. If this item is a directory, the function is called recursively and search into this directory, etc.
To search for files in /target_directory and all its sub-directories, that have been modified in the last 60 minutes:
$ find /target_directory -type f -mmin -60
To find the most recently modified files, sorted in the reverse order of update time (i.e., the most recently updated files first):
$ find /etc -type f -printf '%TY-%Tm-%Td %TT %p\n' | sort -r
If running stat on each file individually is to slow you can use xargs to speed things up a bit:
find . -type f -print0 | xargs -0 stat -f "%m %N" | sort -n | tail -1 | cut -f2- -d" "
This recursively changes the modification time of all directories in the current directory to the newest file in each directory:
for dir in */; do find $dir -type f -printf '%T@ "%p"\n' | sort -n | tail -1 | cut -f2- -d" " | xargs -I <> touch -r <> $dir; done
It breaks badly if any dirs contain spaces — need to set IFS and use quotes: IFS=$’\n’;for dir in $(find ./ -type d ); do echo «$dir»; find «$dir» -type f -printf ‘%T@ «%p»\n’ | sort -n | tail -1 | cut -f2- -d» » | xargs -I <> touch -r <> «$dir»; done;
This simple cli will also work:
You may change the -1 to the number of files you want to list
The following command worked on Solaris :
find . -name "*zip" -type f | xargs ls -ltr | tail -1
After using a find -based solution for years, I found myself wanting the ability to exclude directories like .git .
I switched to this rsync -based solution. Put this in ~/bin/findlatest :
#!/bin/sh # Finds most recently modified files. rsync -rL --list-only "$@" | grep -v '^d' | sort -k3,4r | head -5
Now findlatest . will list the 5 most recently modified files, and findlatest —exclude .git . will list the 5 excluding ones in .git .
This works by taking advantage of some little-used rsync functionality: «if a single source arg is specified [to rsync] without a destination, the files are listed in an output format similar to ls -l» ( rsync man page).
The ability to take rsync args is useful in conjunction with rsync-based backup tools. For instance I use rsnapshot , and I back up an application directory with rsnapshot.conf line:
backup /var/atlassian/application-data/jira/current/ home +rsync_long_args=--archive --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"
where rsync-excludes lists directories I don’t want to backup:
- log/ - logs/ - analytics-logs/ - tmp/ - monitor/*.rrd4j
I can see now the latest files that will be backed up with:
findlatest /var/atlassian/application-data/jira/current/ --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"