Why does the pattern matching expression [A-Z]* match files beginning with every letter except `z'?

Why does the pattern matching expression [A-Z]* match files beginning with every letter except `z'?


Bash-2.05 and later versions have reverted to the bash-2.03 behavior of
honoring the current locale setting when processing ranges within pattern
matching bracket expressions ([A-Z]). This is what POSIX.2 and SUSv3/XPG6
specify.

The behavior of the matcher in bash-2.05 and later versions depends on the
current LC_COLLATE setting. Setting this variable to `C' or `POSIX' will
result in the traditional behavior ([A-Z] matches all uppercase ASCII
characters). Many other locales, including the en_US locale (the default
on many US versions of Linux) collate the upper and lower case letters like
this:

AaBb...Zz

which means that [A-Z] matches every letter except `z'. Others collate like

aAbBcC...zZ

which means that [A-Z] matches every letter except `a'.

The portable way to specify upper case letters is [:upper:] instead of
A-Z; lower case may be specified as [:lower:] instead of a-z.

Look at the manual pages for setlocale(3), strcoll(3), and, if it is
present, locale(1). If you have locale(1), you can use it to find
your current locale information even if you do not have any of the
LC_ variables set.

My advice is to put

export LC_COLLATE=C

into /etc/profile and inspect any shell scripts run from cron for
constructs like [A-Z]. This will prevent things like

rm [A-Z]*

from removing every file in the current directory except those beginning
with `z' and still allow individual users to change the collation order.
Users may put the above command into their own profiles as well, of course.



Home FAQ