Wednesday, October 10, 2007

Unexepected side effects in shell scripts

Shell scripts that fail to preserve the expected environment are a waste of everyone's time. For instance, you call a function which changes the current directory to somewhere else for its own purposes but fails to change back on all exit paths from the function. Trying to remember which directory you might be in later on becomes combinatorially difficult as more conditional statements are added over time. To make it more concrete:

BAD:


function MyFunction() {
cd some_random_directory
# Do something useful
}


GOOD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful

# Now pop the directory off the stack
cd -

}


If you have conditional statements, then make sure you catch them too:

BAD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful and test something
if [ "${test}" == "value" ]; then
return # Oops, this function left us somewhere unexpected
fi

# Now pop the directory off the stack
cd -
}


GOOD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful and test something
if [ "${test}" == "value" ]; then
# Don't forget to pop the directory off the stack
cd -

return
fi

# Now pop the directory off the stack
cd -
}


Basically, preserve an expected and known state between function calls, just like any modern other programming language.

Tuesday, October 9, 2007

Why tags are rarely used as you would expect



As a recent introduction to version control explains, a tag is a way to record a snapshot of the state of some files at a moment in time. When I ask clients why they want to tag their source code, one reason is always "so we can reproduce a build later on". And we generally nod our heads and move on.

The funny thing is that I've never actually seen a tag used for that purpose. Tags do get used to decide whether a particular bug fix was in a certain build, or as references for released builds. But as a developer if I need an earlier build, it's probably to debug a customer problem, which means that the software was already released. Since my build process will have carefully preserved any build artifacts such as debug symbol files, I already have the files that I need.

The other observation is that build systems are designed to be able to use different branches. So if I really needed to recreate a build, I could always branch from the tag and then build using the head of the new branch. Of course, this assumes that nothing else in the environment has changed.

In summary, tags are necessary, but maybe it's not for the reasons that we usually assume.

Thoughts on Making Things

It's time to get some thoughts out of my head about making things. Physical things, software things, relationship things. You know, things. That's what engineers do after all.


Nothing is Monolithic


Nothing created by people is a monolith. It's always made up of smaller things. Something may seem greater than the sum of its parts, but only by how we treat it. Really, it is just a number of things put together.
For instance, a beautiful painting may well move me greatly, but at one level it is still flecks of paint on canvas.

Everything is a Hack

"Hack" in the sense of changing a thing to work around a problem. Nothing created by humans springs fully formed into existence; everything is an extension of what came before. Whatever you see around you has taken multiple versions to become what it is now.

Everything looks different when you think about how it was made

This was an epiphany for me at eighteen, when I began to look around me and ask "how was that thing made?". Once you consider how a thing could have been produced, you understand that thing in a different way. And just as children are no less marvelous after you've read a book on reproductive science, software and hardware is still amazing even when you're the one creating it.

Wednesday, October 3, 2007

Python + bash + ssh = too many layers!



I was reminded by a recent project that many (most?) hardware failures occur at the connections between things. That's why percussive maintenance gets results. I was writing a tool to execute lots of commands remotely on multiple machines, with the different commands being synchronized from one machine. I chose to write the tool in python, and that made creating it nice and easy.

But the shell commands to be executed had to be communicated to each machine. Now I could have written all the commands and their arguments (some of which contained the dreaded spaces) into separate files, copied them over with scp and executed them remotely. Except that there was a fair amount of analyzing of results and deciding what to run next based on the results of the analysis. This is generally tedious to do with bash and can be hard to maintain.

So I decided to keep the control logic in the python tool and use ssh to send each command separately to the remote machines. Which works, but the extra headache of getting all the spaces escaped and quoted added a major amount of maintenance pain to the tool. I could have encapsulated it all more cleanly, I'm sure, but it all "just grew" in scope and size.

The final block in this tottering tower was that some of the commands had to be run using sudo, but not all of them. Getting all those sudo strings in the right place took a few hours of my life from me without feeling that I got much for it. So it goes.

I've tried using CORBA, WebServices etc over the years, but they feel pretty heavyweight for this sort of thing. So the question I'm asking myself is: how could I have done it better?