Linux file matching pattern

Bash function to find newest file matching pattern

In Bash, I would like to create a function that returns the filename of the newest file that matches a certain pattern. For example, I have a directory of files like:

Directory/ a1.1_5_1 a1.2_1_4 b2.1_0 b2.2_3_4 b2.3_2_0 

I want the newest file that starts with ‘b2’. How do I do this in bash? I need to have this in my ~/.bash_profile script.

see superuser.com/questions/294161/… for more answer hints. The sorting is the key step to get your newest file

9 Answers 9

The ls command has a parameter -t to sort by time. You can then grab the first (newest) with head -1 .

My personal opinion: parsing ls is dangerous when the filenames can contain funny characters like spaces or newlines.

If you can guarantee that the filenames will not contain funny characters (maybe because you are in control of how the files are generated) then parsing ls is quite safe.

If you are developing a script which is meant to be run by many people on many systems in many different situations then do not parse ls .

unset -v latest for file in "$dir"/*; do [[ $file -nt $latest ]] && latest=$file done 

Note to others: if you are doing this for a directory, you would add the -d option to ls, like this ‘ls -td | head -1′

The parsing LS link says not to do this and recommends the methods in BashFAQ 99. I’m looking for a 1-liner rather than something bullet-proof to include in a script, so I’ll continue to parse ls unsafely like @lesmana.

@Eponymous: If you’re looking for a one liner without using the fragile ls , printf «%s\n» b2* | head -1 will do it for you.

@DavidOngaro The question does not say that the filenames are version numbers. This is about modification times. Even with the filename assumption b2.10_5_2 kills this solution.

Your one liner is giving me right answer, but the «right» way is actually giving me the oldest file. Any idea why?

The combination of find and ls works well for

  • filenames without newlines
  • not very large amount of files
  • not very long filenames

The solution:

find . -name "my-pattern" -print0 | xargs -r -0 ls -1 -t | head -1 

Let’s break it down:

With find we can match all interesting files like this:

then using -print0 we can pass all filenames safely to the ls like this:

find . -name "my-pattern" -print0 | xargs -r -0 ls -1 -t 

additional find search parameters and patterns can be added here

find . -name "my-pattern" . -print0 | xargs -r -0 ls -1 -t 

ls -t will sort files by modification time (newest first) and print it one at a line. You can use -c to sort by creation time. Note: this will break with filenames containing newlines.

Читайте также:  Ranger файловый менеджер linux

Finally head -1 gets us the first file in the sorted list.

Note: xargs use system limits to the size of the argument list. If this size exceeds, xargs will call ls multiple times. This will break the sorting and probably also the final output. Run

to check the limits on you system.

Note 2: use find . -maxdepth 1 -name «my-pattern» -print0 if you don’t want to search files through subfolders.

Note 3: As pointed out by @starfry — -r argument for xargs is preventing the call of ls -1 -t , if no files were matched by the find . Thank you for the suggesion.

This is better than the ls based solutions, as it works for directories with extremely many files, where ls chokes.

I found that this can return a file that does not match the pattern if there are no files that do match the pattern. It happens because find passes nothing to xargs which then invokes ls with no file lists, causing it to work on all files. The solution is to add -r to the xargs command-line which tells xargs not to run its command-line if it receives nothing on its standard input.

This is a possible implementation of the required Bash function:

# Print the newest file, if any, matching the given pattern # Example usage: # newest_matching_file 'b2*' # WARNING: Files whose names begin with a dot will not be checked function newest_matching_file < # Use $instead of $1 in case 'nounset' is set local -r glob_pattern=$ if (( $# != 1 )) ; then echo 'usage: newest_matching_file GLOB_PATTERN' >&2 return 1 fi # To avoid printing garbage if no files match the pattern, set # 'nullglob' if necessary local -i need_to_unset_nullglob=0 if [[ ":$BASHOPTS:" != *:nullglob:* ]] ; then shopt -s nullglob need_to_unset_nullglob=1 fi newest_file= for file in $glob_pattern ; do [[ -z $newest_file || $file -nt $newest_file ]] \ && newest_file=$file done # To avoid unexpected behaviour elsewhere, unset nullglob if it was # set by this function (( need_to_unset_nullglob )) && shopt -u nullglob # Use printf instead of echo in case the file name begins with '-' [[ -n $newest_file ]] && printf '%s\n' "$newest_file" return 0 > 

It uses only Bash builtins, and should handle files whose names contain newlines or other unusual characters.

Источник

Pattern Matching In Bash

bash

Wildcards have been around forever. Some even claim they appear in the hieroglyphics of the ancient Egyptians. Wildcards allow you to specify succinctly a pattern that matches a set of filenames (for example, *.pdf to get a list of all the PDF files). Wildcards are also often referred to as glob patterns (or when using them, as «globbing»). But glob patterns have uses beyond just generating a list of useful filenames. The bash man page refers to glob patterns simply as «Pattern Matching».

Читайте также:  Which linux mint to install

First, let’s do a quick review of bash’s glob patterns. In addition to the simple wildcard characters that are fairly well known, bash also has extended globbing, which adds additional features. These extended features are enabled via the extglob option.

Pattern Description
* Match zero or more characters
? Match any single character
[. ] Match any of the characters in a set
?(patterns) Match zero or one occurrences of the patterns (extglob)
*(patterns) Match zero or more occurrences of the patterns (extglob)
+(patterns) Match one or more occurrences of the patterns (extglob)
@(patterns) Match one occurrence of the patterns (extglob)
!(patterns) Match anything that doesn’t match one of the patterns (extglob)
$ ls a.jpg b.gif c.png d.pdf ee.pdf $ ls *.jpg a.jpg $ ls ?.pdf d.pdf $ ls [ab]* a.jpg b.gif $ shopt -s extglob # turn on extended globbing $ ls ?(*.jpg|*.gif) a.jpg b.gif $ ls !(*.jpg|*.gif) # not a jpg or a gif c.png d.pdf ee.pdf 

When first using extended globbing, many of them didn’t seem to do what I initially thought they ought to do. For example, it appeared to me that, given a.jpg, the pattern ?(*.jpg|a.jpg) should not match, because a.jpg matched both patterns, and the ? is «zero or one», right? Wrong. My confusion was due to a misreading of the description: it’s not the filename that can match only once, it’s the pattern that can match only once. Think of it terms of regular expressions:

Glob Regular Expression Equivalent Description
?(patterns) (regex)? Match an optional regex
*(patterns) (regex)* Match zero or more occurrences of a regex
+(patterns) (regex)+ Match one or more occurrences of a regex
@(patterns) (regex) Match the regex (one occurrence)
$ ls *.pdf ee.pdf e.pdf .pdf $ ls ?(e).pdf # zero or one "e" allowed e.pdf .pdf $ ls *(e).pdf # zero or more "e"s allowed ee.pdf e.pdf .pdf $ ls +(e).pdf # one or more "e"s allowed ee.pdf e.pdf $ ls @(e).pdf # only one e allowed e.pdf 

And while I’m comparing glob patterns to regular expressions, there’s an important point to be made that may not be immediately obvious: glob patterns are just another syntax for doing pattern matching in general in bash. And you can use them in a number of different places:

  • After the == in a bash [[ expr ]] expression.
  • In the patterns to a case command.
  • In parameter expansions (%, %%, #, ##, /, //).
Читайте также:  Linux find file with spaces

The following example uses pattern matching in the expression of an if statement to test whether a variable has a value of «something» or «anything»:

$ shopt +s extglob $ a=something $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi yes $ a=anything $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi yes $ a=nothing $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi no 

The following example uses pattern matching in a case statement to determine whether a file is an image file:

shopt +s extglob for f in $* do case $f in !(*.gif|*.jpg|*.png)) # ! == does not match echo "Not an image: $f" ;; *) echo "Image: $f" ;; esac done 
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf Image: a.jpg Image: b.gif Image: c.png Not an image: d.pdf Not an image: e.pdf

In the example above, the pattern !(*.gif|*.jpg|*.png) will match a filename if it’s not a gif, jpg or png.

The following example uses pattern matching in a %% parameter expansion to remove the extension from all image files:

shopt -s extglob for f in $* do echo $f%%*(.gif|.jpg|.png)> done 
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf a b c d.pdf e.pdf 

A feature that I just recently became aware of is that you can do the above action in one fell swoop: if you use «*» or «@» as the variable name, the transformation is done on all the command-line arguments at once. [Note to self: always read the last half of the paragraph from now on]:

shopt -s extglob echo $*%%*(.gif|.jpg|.png)> 
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf a b c d.pdf e.pdf 

And that works on arrays too:

shopt -s extglob array=($*) echo $array[*]%%*(.gif|.jpg|.png)> 
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf a b c d.pdf e.pdf 

The biggest takeaway here is to stop thinking of wildcards as a mechanism just to get a list of filenames and start thinking of them as glob patterns that can be used to do general pattern matching in your bash scripts. Think of glob patterns as regular expressions in a different language.

Any code found in my articles should be considered licensed as follows:

# Copyright 2019 Mitch Frazier # # This software may be used and distributed according to the terms of the # MIT License or the GNU General Public License version 2 (or any later version). 

Mitch Frazier is an embedded systems programmer at Emerson Electric Co. Mitch has been a contributor to and a friend of Linux Journal since the early 2000s.

Источник

Оцените статью
Adblock
detector