BASH 01: Trim Your Args!

Ideally I wanted to put this article in a category called “Stuff we all need to know but never get around to master” because, for me, that is squarely where the subject of shell scripting (BASH, SH, et. al.) firmly lived for way too long. As software developers, we are experts in our respective platforms. Whether that be mobile apps, cloud API’s, or MRP / ERP back-end behemoths, we know the interplay of our code and interfaces to our cores. But every so often we all end up in a situation that requires us do some dreaded shell scripting to solve a specific problem. When this happens we can be seen floundering about; confounded by this thing that seems to be a simple coding language but never seems to work the way we think it should. So we search the internets for some piece of script that appears to do something close to what we need. Then we tweak this strange bunch of back-ticks, curly braces, and escape characters until we get the desired outcome (maybe). But we never really understand what it’s doing. I never liked this, so when I needed to develop more sophisticated HLL-type utilities in BASH that were to become a fixed part of a major application, I made it a point to really dig in, develop, and understand SH and BASH scripting.

In this series of articles, I thought it might be useful to discuss shell scripting in terms of small Code Snippets – imparting some shell scripting knowledge and demystification through the dissection of actual working, useful, and re-usable code. In this first article we will discuss a simple trim() function. It does the same thing as similarly named built in functions (BIF’s) of high level languages (HLL’s). It will strip any leading or trailing whitespace characters from a string variable passed to it. We all know our applications should not trust data that originates from external, untrusted, sources. Any data passed in to our shell scripts as a string argument should be devoid of leading or trailing whitespace (CR/LF, TAB, etc). Such whitespace can cause havoc in our code if that string is blindly used in downstream operations so, we must trim() them.

Below is our sample script containing the trim() function. It might be helpful if you right-click and open the image in a separate browser window so you can follow along with the discussion. Alternately, you can

Download Source File

and open it in your favorite text editor to follow along.

Some Basics

The first thing you will notice is that I like to structure my scripts the same way I do any other application or program. We start with the “Shabang” line which points to where our BASH interpreter resides. Following that is a basic documentation box with the typical attributions and revision information.

The next section (a) is something I put at the top of my BASH scripts. It retrieves variables that I find I often need in my scripts for one reason or another.

LINE 22: Determines the absolute path in the systems file directory where the running instance of the script was loaded from and saves that information in a variable named SCRIPTPATH. This variable is defined as READONLY since there is no reason for it to be changed. Why might you need this information? Within your script you may need to access other resources that are expected to be in the same location as this script or else you may need to write messages to a log that need to identify the location of the running script. Also, you may need to pass this information to other commands or utilities that don’t work with relative paths. How this odd string of characters accomplishes this is a discussion perhaps for another article. Suffice it to say that you may see some shorter code that purports to accomplish the same thing, but those have issues when the script was executed using a relative reference (ie: “../myscript”) and the above is thus far the only reliable way I have found to always get the full, absolute path.

LINE 25: Retrieves the name of the running script. You might think this odd since your script should have a fixed name, no? No. Script files are often renamed when used or cloned and modified for similar application implementations. If the name of the script is needed for instance, to load a configuration file with the same file name prefix as the script, then retrieving it dynamically avoids pitfalls when the script is renamed and avoids having to do scan-and-replace operations on the code whenever it is renamed.

LINE 28: Retrieves the PID (process ID) of this instance of the running script. AKA the thread. This is useful for logging when there might be multiple instances of this script running concurrently on the system or when spawning an error recovery operation to clean up (kill) this running process if an unrecoverable error occurs.

The Trim() Function

LINE 37: (b) is where the meat of our discussion takes place. This is where we difine our re-usable trim() function. In shell scripting, a function is simply defined by its name followed by open and close parenthesis and delimited by an opening curly brace and closed curly brace. Script functions do not require arguments to the function to be defined. Just like arguments passed to the script itself, arguments passed to the function are referenced within the function simply as the variables $1, $2, etc. You can see the argument being passed into the function being referenced as $1 on LINE 39.

Our goal is that our function should behave as much like a function in an HLL as possible. Ideally we want to simply pass a variable to the function and have it trimmed. But script functions do not have the capability to return a value to the caller. For example:

myTrimmedString=trim ${myString} // not valid in a shell script

Is not valid because trim() can not return a value. Without the ability for a function to return a value, our only option is to set the result to a global variable that can be accessed after the function call. Yet if that result variable is set inside the function, then it must be hard-coded and will be the same for each call to the function, resulting in code like so:

trim ${myString}

myString=${trimResultVar}

In the above example, the defined trim function would take the input argument as a string, trim the leading and trailing whitespace, and then assign that trimmed value to a hard-coded global variable named trimResultVar. Not only is this messy code, but it requires that the code outside the function have knowledge of the result variable name used inside the function. There is also no obvious connection between the two lines of code above which makes the code hard to follow.

But we have a better solution.

You will note in our sample program on LINE 55 that when we invoke our trim() function, we pass myArg as a literal to the function and not ${myArg} the variable. This is because we are passing the variable name to the function, not the actual string of text that we want trimmed. LINE 39 in our trim() function takes the name of the variable (that contains the text to be trimmed) and assigns that to a local variable named _varName. The local qualifier limits the scope of the variable to only the local function block and the leading underscore in the variable name is a commonly accepted convention for localized variables. Then, LINE 40 dereferences that variable name to get its value (the string to be trimmed) and assigns that string to another local variable _str. The exclamation point in the dereference statement causes the string of characters following it to be treated as a variable that contains another variable name (a “varvar”), expands it to its contents, and then retrieves the value of the variable having that name.

Example:

myVar=’abcdef’

myVarName=’myVar’

echo ${myVar} // prints ‘abcdef’

echo ${myVarName} // prints ‘myVar’

echo ${!myVarName} // prints ‘abcdef’

Now that we have both the name of the variable to be trimmed (_varName) and its value (_str) we can trim any leading and trailing whitespace from the value on LINE 42 (c). This looks like one of those cases of scripting “voodoo” where some of us would just be tempted to copy/paste and then tweak until we get the desired result. But if you don’t understand what is happening there then you have no clear concept of either the long term stability of your application or the security implications for your system. Besides, it’s not all that complex to understand, so let’s break it down.

The first part of the line is the assignment:

_str=”$( … )”

If you are not familiar with this method of assigning a value to a variable, it is called command substitution. It just takes the results of any command you put between the parens and assigns that result to the variable. The command we put inside the parens is what is removing the whitespace. So let us look more closely at that command.

The first part of the command is pretty straight forward and we all probably get it:

echo -e “$_str”

On its face, this statement simply outputs the value of the variable _str. More accurately, it outputs the contents of _str to stdout (the standard output). The -e option is there just to enable backslash escaped characters if they happen to be in the string variable. The next part of our command is the pipe character |. This operator simply sends the results of whatever command is in front of it (in this case, our echo command) to the stdin of whatever command follows it.

What follows it is an invocation of the sed utility. sed is a *nix utility that performs basic stream editing on an input stream or file. In this case, we are asking sed to perform two operations on the stream of characters we are sending to its stdin via our pipe symbol. The first operation:

s/^[[:space:]]*//

is a regular expression that will remove any leading whitespace from the string/stream. The second operation:

s/[[:space:]]*$//

is another regular expression that will remove any trailing whitespace from the string/stream. If you are unfamiliar with how regular expressions work, there are many very good tutorials out there that are easily found with an internet search. The -e option is different for sed than for the echo command. For sed, the -e option is a way to stack a number of operations to be performed on the same string/stream. So in this case, the -e options allow us to stack the two regular expressions described above so that they are both performed on our string in a single pass.

So much for the “voodoo” line. Breaking it down into its constituent parts, we can see and understand what is really happening. We have passed the contents of the _str variable to the sed utility that uses regular expressions to strip leading and trailing whitespace, and we assign the results right back into the _str variable.

The last line of our function (d) takes the trimmed results currently stored in the local _str variable and moves that result back into the variable whose name was passed into our function as an argument. Here we have another shell limitation to get around. The dereferencing directive (the exclamation point) we used on LINE 40 is only valid on the right side of an assignment. The left side of an assignment must be an actual variable name – not a varvar. To avoid having to assign the results to a hard-coded global variable, we need a way to assign our results back to the original variable name held in our varvar, _varName.

The solution is to use the eval command. The eval command basically takes the string you pass to it and executes it as if you had typed it on a command line. This allows us to use our varvar on the left side of an assignment because the shell interpreter will expand the varvar to its contents (as it does all variables) and so allows us to assign the results to the original variable. To better understand this, let’s say our function was invoked by passing in the variable named scriptArg01 that has a value of ” Hello World ” (note the leading and trailing spaces). When our function gets down to LINE 44, the eval command results in

$_varName=”‘${_str}'” being expanded to actually execute scriptArg01=’Hello World’

Thus, our goal of a reusable function to trim leading and trailing whitespace is realized. Anywhere in our scripts, we can ensure arguments or data that originates outside of our application’s control is clean of any carriage returns, line feeds, or tabs with a single, simple line of code:

trim scriptArg01

That’s it! Before invoking the function, scriptArg01 might contain ” hello world “ but after invoking that one line of code, it now contains “hello world”. In our sample program, we output the result with an echo statement (e). We sandwich the value between two pipe symbols just so that it is easier to see that there is no leading or trailing whitespace in the result.

Conclusion

In this article, we dissected a very simple program function called trim(). Though the function seems simple and small, we covered some crucial scripting concepts such as:

local variables to limit function variables to the local scope
creating and dereferencing varvar’s to allow passing any variable to our function without hardcoding
command substitution in order to assign the results of calling external system utilities like sed to a variable
using echo and pipe to send the contents of a variable to an external command line utility
using eval to enable us to assign values to the contents of a varvar

In future articles covering other reusable code snippets, we will undoubtedly see these same concepts again in slightly different implementations, but this should just solidify your understanding of them even more. My hope is that just a few such articles will be enough to make you more comfortable with shell scripting and give you the confidence to apply HLL structure and best practices to ensure they are a solid part of your application.

BASH 01: Trim Your Args!

Some Basics

The Trim() Function

Conclusion

Submit a Comment Cancel reply

Categories

BASH 01: Trim Your Args!

Some Basics

The Trim() Function

Conclusion

Submit a Comment Cancel reply

Categories

Tags