Wednesday, March 29, 2006

Advanced use -exec in the Unix find utility

I recently was tasked with writing a utility to parse through a large set of log files, and delete files based on some very specific, and relatively complex, rules. Basically I needed to delete files that contained any of a couple specific strings anywhere in the file, and also, deleted any file that contained some specific strings in the last few lines:

The file needed to be deleted if it was named *.log, and any of the following were true:
  • contains the string "Termination signal '15'" anywhere in file
  • contains the string "Begin Stack Backtrace" anywhere in the file
  • contains the string "fatal error terminated partition" in the last few lines
  • contains the string "shutting down partition as requested" in the last few lines
I didn't want to write something that would parse through every file for each string, and I wanted something that would be extensible if I was requested to search other things. I decided to use the Unix find command with a small shell script wrapper to improve readability.

# Delete all logs with the following anywhere in the log
fullregex="Termination signal \'15\'|Begin Stack Backtrace"

# Delete all logs with the following anywhere in the last few lines of the log
tailregex="fatal error terminated partition|Shutting down partition as requested"

find $FORTE_ROOT/log/ -name \*.log -type f \
\( \
-exec sh -c "tail \$0 | grep -qEi \"$tailregex\"" {} \; \
-o \
-exec grep -qiE "$fullregex" {} \; \) \
-exec rm {} \;

There are two main components at work here. The first is grep and regular expressions, and second is the extended logic and language of find itself.

Many people overlook the advanced -E option of grep. Regular expressions can be cryptic, and daunting. But their power is indisputable. The regular expressions (fullregex, and tailregex) are pretty simple. They each match one of two strings seperated by the regex or operator '|'. The grep -q option is very useful in scripting because it simply returns a Boolean 0 or 1 based if the pattern was matched.

The find command is another powerful tool that is often under-utilized. The real power find of is that it's options are themselves a mini programming language. In fact, the options of find are an implicit if statement as you might find in any other language. For each file in the directory it's searching, it interprets each option in the order until it gets a false result, or it runs out of options.

By using the -exec option, find becomes a very powerful tool for manipulating large sets of files based on arbitrary rules. You can execute any program you like with -exec and find will interpret it's exit code as a true or false value (0 is true, 1 is false), and continue or end processing based on it.

In my example, checking the last few lines of a file is much more efficient than scanning the entire file. The reason I point this out is that I want to keep my processing to a minimum. Because find stops processing as soon as it knows a result, I can use that to ensure that I do the minimum amount of work possible. Notice the use of parentheses and the -o. This is a logical grouping, and the -o represents an OR. All of the options of find are implicitly AND'ed unless you use the -o between them. In this case I want to remove the file if either grep succeeds. And because an OR always returns true if the first part (the first grep) is true, find does not need to process the second grep.

For clarity, here is what the above find command would do for each file if written as a standard shell script:


#find $FORTE_ROOT/log/ -name \*.log -type f
# Assume $filename is replaced by the current file
if [ $filename == *.log ]; then # -name \*.log
if [ -f $filename ]; then # -type f
# \( \
# -exec sh -c "tail \$0 | grep -qEi \"$tailregex\"" {} \; \
# -o \
# -exec grep -qiE "$fullregex" {} \; \) #
# For the sake of this example, I've broken the or'd
# -exec's into an if-elif statement.
if ( tail $filename | grep -qEi "$tailregex" ); then
# -exec rm {} \;
rm $filename
# Note how if the first grep succeeds, the second
# never occurs
elif ( grep -qiE "$fullregex" $filename ); then
# -exec rm {} \;
rm $filename
fi
fi
fi

The last thing to talk about is the the use of -exec. One thing I've found frustrating about -exec in the past was that I couldn't get command piping (|) to work. I realized that the reason for this is that -exec is not being processed by a shell. Once I realized that, the problem could be resolved by introducing a shell:
-exec sh -c "tail \$0 | grep -qEi \"$tailregex\"" {} \;
Note that you have to pass the filename ( {} ) as a parameter to the shell, rather than using it with the shells commands. You can then reference the filename with \$0.

The last bit of info, is how to terminate -exec. find needs a way to distinguish its own parameters from those of the -exec. Because that, you need to terminate the -exec option with \;.

In the end, find is a powerful tool in the arsenal of a Unix systems admin/engineer. It's a tool that, like grep, sed, awk, etc..., can solve a complicated task in relatively simple terms.



Tuesday, March 21, 2006

Community Service Clubs

As a full time tele-commuter, I find it difficult sometimes to get a social outlet. I have a family with two small children, and most of my non-working time is spent with them.

Wednesday, March 8, 2006

Computer Systems Engineering

When asked what I do, my general response is that I am a Computer Systems Engineer. Unless I'm speaking to a fellow IT infrastructure type, this inevitably leads to blank stares or "Oh, I see" type responses. The conversation often goes back and forth a little bit before ending with a comment something like "so you work with computers". I have always found describing what I do a bit uncomfortable. Lately, I've been trying to connect with more people from my community. In the process of doing that I've had several of these conversations, and I've decided to try to solidify it.

Searching the web for the term Systems Engineer I found a few different definitions.
  • INCOSE description of Systems Engineer
    • I find this definition about as uncomfortable as my recent conversations. It's full of jargon. It describes a concept more that a real world role.
  • http://www.calmis.ca.gov/file/occguide/ENGCOMP.HTM
    • The first paragraph of this description (under "The Job") is actually fairly concise with regard to one part of the job. But it completely misses the mark going forward.
  • Wikipedia
    • Ok, this is taken primarily from the INCOSE definition, but it reads a bit better. It's still too long winded and generic to really use to describe what you do.
In addition, these definitions are not really specific to the IT industry. In fact, they're really focused around an engineering methodology as applied to complex systems involving multiple technologies. I don't believe that Computer Systems Engineer was borne from this.

The term Computer Systems Engineer seems more likely to have evolved from the term Systems Administrator. In my experience, the terms are used interchangeably. The term System Administrator has acquired many bad connotations over the years. Changing the word to engineer is an attempt to overcome those connotations, and bring to light how the role has evolved in the modern IT world.

What's interesting is that the most successful Computer Systems Engineers that I've met are in practice Systems Engineers by the Wikipedia/INCOSE definition. They have learned, either by training or experience, to apply various levels of engineering methodologies to their work. Their success has put them in charge of increasingly complex systems and forced them to learn technologies beyond those expected of the Systems Admin. And their value in that role has brought others to them to help understand the interdisciplinary nuances of complex systems.

All of that is still very little help in telling my neighbor what I do for a living. So let's instead look at how other highly technical roles describe themselves. Most other types of engineers work in a much more physical world. Rather than describing what they actually do, they give examples of projects they've worked that their audience can relate to. "I helped design the Tacoma Narrows Bridge." "I ensured that new downtown apartment complex won't fall down in the next earthquake". Here is something that can be useful, given that you work on a project that your audience can relate to. I've explained to people in the past that I do engineering on the system that helps you track packages with DHL. That's probably the most appropriate answer in a social setting. But what if you're trying to establish a business relationship with someone?

In the end I think the goal is not necessarily to explain exactly what you do. Instead, it may be to provide a small insight, while opening an avenue for conversation. In that spirit, I plan to start offering tidbits of info that most people can relate to, and use to ask more detailed questions. Offering that I work on the computer systems that track packages and schedule pickups for DHL is a good example. Nearly everyone has shipped a package. If they know nothing of computers, they can discuss the shipping aspects. If they do know computer systems, they can ask for details about what they understand from it.