Tuesday, April 22, 2008

More bugs than revisions


(Debian bug count from http://master.debian.org/~ajt)

Heard as an aside: "that project's got more bugs than revisions!"

Which made me think what kind of project might reasonably expect that statement to be true. The space shuttle software as a whole, perhaps? Lots of testing implies lots of bugs, and scrupulous code reviews means fewer commits.

At the other end of things, many small projects might get a few hundred revisions and then a few hundred bugs over time. Of course, most bugs will end up closed in one way or another.

Anybody able to point to a real project with more bugs than revisions?

Wednesday, April 16, 2008

Just costs you double

Me: "just" costs you double.
Them: Huh?
Me: Every time you use "just" to describe a feature or a process it tells me you've made a gross assumption about what I'll need to do ...

That comment on a post titled All I Need Is A Programmer made me nod in agreement. "Just" is a warning sign in conversations for me, like "always", "never", and the dreaded "trust me".

All those phrases make me think of exceptions. "Always? Well, there is that edge case ...". "Just? Why 'just'? What about ..." and I begin to wonder whether the person really knows what they're talking about.

Just say 'no' to 'just'. (You knew that was coming, right?)

Monday, March 17, 2008

The Mac Mini as a Laptop and VMWare Fusion

I needed a new laptop because the one I had borrowed had disk errors in the Windows registry, as disks do after too long running Windows. Since I find laptop screens, keyboards and mice uniformly awkward, I bought a laptop without the keyboard, mouse or screen - a Mac Mini. (Frys, US$750)

The upsides, apart from being less than half the price of a decent Mac laptop, are that it really is compact and there are some decent virtualization choices (Bootcamp, Parallels and Fusion). I went with VMWare Fusion and overall, it behaves as you might wish. Easy to install, cheap at $100 and runs most VMWare images.

The downsides of not buying a laptop are that power outages become more significant without a battery, and perhaps the disk drive isn't mounted in a suitable manner for carrying the box in a bag every day? Swapping from VM to VM, or VM to host OS sent the swap rate soaring, so I installed an extra 1GB of RAM to help with that. While installing the extra memory, I took a good look at the hard disk and it seems to be mounted just as it would be in a laptop, so I'm hoping that the disk will survive being moved every day. If anyones else has experience with mac mini hard disk lifetimes, I'd be interested to hear from you.

And so today, I had another "virtualization is great" moment. The server that I had traveled an hour to work on was busy having its motherboard replaced. Since I actually wanted to work on was the OS configuration, so I booted the Windows server image with Fusion on my little Mac Mini, did the necessary work, and moved right along. Very neat.

Tuesday, February 26, 2008



The February issue of Python Magazine is out and contains my article "Using Python and SOAP to create a CLI for JIRA" about the Python CLI that I wrote for JIRA a while ago. The article's summary reads:

Many web applications include an API that lets you interact with them from the command line as well as with a browser. In this article, Matthew shows how to build a command line interface for JIRA, a well-known issue tracking system, using Python and SOAP. JIRA is a Java application, but using SOAP allows you access to many of its features using just Python.


One hint that I wish I had remembered to add is that when you have a redirect to your JIRA server, for instance when http://jira.mycompany.com is redirected by Apache or IIS to http://jira.mycompany.com:8080, you may see your login mysteriously fail. The answer is that you have to use the redirected URL with the JIRA CLI. You can find out what the redirected URL is by running the CLI with the argument -v10 to increase the logging verbosity and look at the line that starts with "Host". This example shows that the port to use is 8080.

*** Outgoing HTTP headers **********************************************
POST /rpc/soap/jirasoapservice-v2 HTTP/1.0
Host: localhost:8080

Wednesday, February 20, 2008

Finally, a use for those JIRA user properties

One of the most useful JIRA plugins I've found is the JIRA Toolkit, described as "a bunch of neat custom fields Atlassian have developed for their own use". As an aside, if they're so neat and useful then why aren't they in the core product?

One of the more recently developed fields is the View User Property custom field, which is currently only documented in the issue that last link refers to. This handy little field allows you to display properties that you previously added to a user, as a read-only field in each issue.

For example, add a property named "Company" to some of your users in JIRA, then install the JIRA toolkit and create an instance of the View User Property field. Now configure it with "reporter:Company". Add the new field to some screens and the value of the reporter's Company field will show up in the issue. This also works with "assignee:Company". You can even get a user name with "My Custom Field:Company". This will use the user name found in the custom field named "My Custom Field". Just use the JIRA custom field name, including any spaces.

I've just used this to associate a company name with every customer user, and I'm sure there are a number of other pieces of per-user information that could be displayed with this field. One missing piece for this field is the ability to search for issues with particular values.

Thursday, February 14, 2008

Evaluating JIRA Multisite


Given the number of organizations already using JIRA across a WAN, there is plenty of interest in finding ways to improve the experience. I've had a few clients suggest using distributed databases, changing HTTP caching behaviour or not using HTTPS. None of these are really great fixes, and are complicated by the fact that JIRA keeps much of its data in a local Lucene index outside the database for performance reasons.

So when WANdisco announced a beta of JIRA Multisite last November in partnership with Atlassian, I was interested to see what it would do. It's billed as a high availability solution and in doing that it gives you local JIRA servers with all your data nicely synchronized. There is another approach that was announced at about the same time, the JIRA clustering solution Scarlet. I haven't evaluated Scarlet yet but it appears to have a single point of failure by default.

I contacted WANdisco to ask for an evaluation copy and they were happy to help. They have an existing replication tool for CVS and Subversion that they've connected to JIRA. You need their tool and their instance of JIRA. As an aside, though they are keeping up with each release of JIRA, I'd rather have instructions about how to modify my existing instance of JIRA to work with their tool, but I'll take what I can get for now.

To provide high availability you have to have 3 or more instances of JIRA, but since I was mainly interested in how each sites' performance changed, I just set up two instances of JIRA, one in San Jose, CA and the other in Bangalore, India. The connection between the two sites is a clogged T1 at best and the team in India often have sluggish response times from JIRA.

Setup Experience

WANdisco wanted to set the tool up, but I did it myself in an hour for the two nodes. Instructions were beta quality, but not bad. After that piece of stubbornness, their tech founder worked out what I had done wrong in about an hour, and then together we had it all working in another hour. Three hours from scratch is pretty good as these things go.

Testing

I modified a bug in San Jose and watched the change appear in India a second or two later. Then I modified a bug in India and saw the change locally in about the same time. Just as expected. Then we stopped one of the JIRA servers, made some changes, waited a bit, restarted the server and saw the changes all get synchronized. Other users updated issues over the next month and the changes appeared just as expected. The big win was that the users in India saw their local response improve dramatically. The underlying WANdisco replication tool was rock solid for the month's evaluation.


Restrictions


The version I tested didn't synchronize attachments, but that has been added since then. You do have to use the same OS (and database I believe) for all the instances of JIRA. This was not a problem for me, but if you have a Windows server in one location and Linux in another, it won't work.

I didn't try https, but I did set up LDAP authentication and that worked as expected

I'm pretty sure that if I wanted to go back to one instance of JIRA I could have exported the data and then reimported it into a non-multisite instance of JIRA.

Cost

Pricing is public and is US $7500 per instance of JIRA, which is about 50% more than the current Enterprise license cost. This seems about right given the cost of the tool and the target customers. Support comes from WANdisco and JIRA, in that order.

Summary

JIRA Multisite is still in its early stages, but it is very promising. It worked well for me with little effort, and provides good value for the price.

Wednesday, November 28, 2007

Perspective of a 3-year old


A few choice observations from my youngest son in the past few months. I'm posting them as reminders that not everyone thinks as we expect them to.

During the minor earthquake in San Jose, we all went outside. Afterwards he kept running outside then back inside again. When asked why, he said he was "looking for more earthquakes." (I guess we found one outside the first time?)

We came across a dried-up deer carcass, and I was teaching him to use a stick instead of his hands to touch roadkill. This lead to a discussion about death and whether the deer would come back to life. I gently explained that the deer was done with its body now. He looked thoughtful for a moment, and then drove the stick through the carcass, exclaiming "you stay dead then!"

When asked during the Christmas Pageant rehearsal what kinds of animals were in the manager with Jesus, he piped up with "pterodactyls!". To his disappointment, he is going to be a lamb instead.

Tuesday, November 13, 2007

Choosing Project Names


The discussion What' in a Project Name? over at Coding Horror reminded me of section 3.6.1 of my book Practical Development Environments:

Project names are usually chosen by engineering groups, with one name for each significantly different version of the products that they are working on. There should be no need to change a project's name once it has been chosen. Product names, on the other hand, are the names that customers see, and these names are usually chosen to help a product sell or to become popular. Product names can change at the whim of a market research poll or a new VP of Sales.

Some general guidelines for choosing names for projects are:

Keep it short

Since project names may appear in filenames or source code, shorter project names are preferable; four to six characters is common. Longer names will only be abbreviated anyway, and usually in two different ways.

Use distinctive sounds

Project names should sound different from each other when spoken aloud by people whose native language is not the one used by the rest of the group. Even if everyone speaks English, having two projects named "ctest" and "seebest" is too close for comfort.

Use low-frequency letters

It's much easier to be confident that all references to a project name can be found if the name contains characters that are less common in the local language. This is a good argument for choosing project names that use unusual characters, such as the letters q and z for English.

Apocryphal aside: a few years ago there was a project named IDS that apparently had a function named IDSConnect. Then the project was renamed DIS and all its functions were renamed accordingly, which led to their function for creating connections being renamed to DISConnect. The letters d, i, and s are too common in English to simply reuse them in
such an anagram.

Make it unmarketable

Sometimes a project name will be reused as a product name, but not if it is already trademarked, or if you make it odd or crude enough! Project names don't have to have a theme, though that can be fun. They don't even have to be meaningful, just memorable with an obvious way of pronouncing the word. You can choose a number of suitable names once and then let people decide which one they want to use next. Names of stars, types of sushi, rare diseases, and characters from comic books are some ideas to start with for project names.

Wednesday, October 10, 2007

Unexepected side effects in shell scripts

Shell scripts that fail to preserve the expected environment are a waste of everyone's time. For instance, you call a function which changes the current directory to somewhere else for its own purposes but fails to change back on all exit paths from the function. Trying to remember which directory you might be in later on becomes combinatorially difficult as more conditional statements are added over time. To make it more concrete:

BAD:


function MyFunction() {
cd some_random_directory
# Do something useful
}


GOOD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful

# Now pop the directory off the stack
cd -

}


If you have conditional statements, then make sure you catch them too:

BAD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful and test something
if [ "${test}" == "value" ]; then
return # Oops, this function left us somewhere unexpected
fi

# Now pop the directory off the stack
cd -
}


GOOD:


function MyFunction() {
# Change to the necessary directory
pushd some_random_directory

# Do something useful and test something
if [ "${test}" == "value" ]; then
# Don't forget to pop the directory off the stack
cd -

return
fi

# Now pop the directory off the stack
cd -
}


Basically, preserve an expected and known state between function calls, just like any modern other programming language.

Tuesday, October 9, 2007

Why tags are rarely used as you would expect



As a recent introduction to version control explains, a tag is a way to record a snapshot of the state of some files at a moment in time. When I ask clients why they want to tag their source code, one reason is always "so we can reproduce a build later on". And we generally nod our heads and move on.

The funny thing is that I've never actually seen a tag used for that purpose. Tags do get used to decide whether a particular bug fix was in a certain build, or as references for released builds. But as a developer if I need an earlier build, it's probably to debug a customer problem, which means that the software was already released. Since my build process will have carefully preserved any build artifacts such as debug symbol files, I already have the files that I need.

The other observation is that build systems are designed to be able to use different branches. So if I really needed to recreate a build, I could always branch from the tag and then build using the head of the new branch. Of course, this assumes that nothing else in the environment has changed.

In summary, tags are necessary, but maybe it's not for the reasons that we usually assume.

Thoughts on Making Things

It's time to get some thoughts out of my head about making things. Physical things, software things, relationship things. You know, things. That's what engineers do after all.


Nothing is Monolithic


Nothing created by people is a monolith. It's always made up of smaller things. Something may seem greater than the sum of its parts, but only by how we treat it. Really, it is just a number of things put together.
For instance, a beautiful painting may well move me greatly, but at one level it is still flecks of paint on canvas.

Everything is a Hack

"Hack" in the sense of changing a thing to work around a problem. Nothing created by humans springs fully formed into existence; everything is an extension of what came before. Whatever you see around you has taken multiple versions to become what it is now.

Everything looks different when you think about how it was made

This was an epiphany for me at eighteen, when I began to look around me and ask "how was that thing made?". Once you consider how a thing could have been produced, you understand that thing in a different way. And just as children are no less marvelous after you've read a book on reproductive science, software and hardware is still amazing even when you're the one creating it.

Wednesday, October 3, 2007

Python + bash + ssh = too many layers!



I was reminded by a recent project that many (most?) hardware failures occur at the connections between things. That's why percussive maintenance gets results. I was writing a tool to execute lots of commands remotely on multiple machines, with the different commands being synchronized from one machine. I chose to write the tool in python, and that made creating it nice and easy.

But the shell commands to be executed had to be communicated to each machine. Now I could have written all the commands and their arguments (some of which contained the dreaded spaces) into separate files, copied them over with scp and executed them remotely. Except that there was a fair amount of analyzing of results and deciding what to run next based on the results of the analysis. This is generally tedious to do with bash and can be hard to maintain.

So I decided to keep the control logic in the python tool and use ssh to send each command separately to the remote machines. Which works, but the extra headache of getting all the spaces escaped and quoted added a major amount of maintenance pain to the tool. I could have encapsulated it all more cleanly, I'm sure, but it all "just grew" in scope and size.

The final block in this tottering tower was that some of the commands had to be run using sudo, but not all of them. Getting all those sudo strings in the right place took a few hours of my life from me without feeling that I got much for it. So it goes.

I've tried using CORBA, WebServices etc over the years, but they feel pretty heavyweight for this sort of thing. So the question I'm asking myself is: how could I have done it better?

Monday, September 3, 2007

JDiff 1.1.0 released



I finally got around to incorporating a few patches into a release for JDiff, the Java doclet I wrote years ago to generate Javadoc-like reports about the changes between two versions of a Java API. I still think it's a cool tool, in my opinion. And in the opinion of Apple, Oracle and Sun too.

I took the opportunity to rearrange all the source files to make it easier to work on, and combined all the Ant files into one big build.xml file. I noticed when I was updating the documentation that I had to tell some Java developers what Ant was in 2001, which is not the case now. I'm still trying to wrestle the repository into Subversion in SourceForge.

Tuesday, August 21, 2007

Scott McCloud at Hijinx Comics



I had a great time last night hanging out with my son Jacob (8) at our local comic store Hijinx Comics with Scott McCloud, his family and a bunch of other locals in the know. Scott McCloud is probably the most self-aware and thoughtful person creating comics at the moment. I liken the year-long tour he is just finishing to a professor travelling the world to spread his knowledge. His original book Understanding Comics has been worth re-reading every few years - it's one of those books, like Edward Tufte's. Any kind of collaboration between them would be an amazing thing to behold.

The other thing we really enjoyed was his family being there. Jacob was amazed to find any girl who liked Pokemon as much as he does, and Scott's wife Ivy is also a great presence in any room. Many thanks to Dan Shahin for organizing the get-together.

Wednesday, August 1, 2007

Atlassian has acquired Cenqua

Which is not surprising, given how many of my customers use both Jira for bugtracking and FishEye as a front-end browser for their version control. What's funny to me is that I suggested it to Scott Farquhar, the CEO of Atlassian last month at the Silicon Valley Atlassian User Group meeting, but I didn't pick up on why he looked a bit surprised by my suggestion that "hey, y'know, Atlassian should buy those FishEye guys!". Next time, I'll just ask if he wants to play poker.

No word on the terms of the deal, but from my perspective, it's great news. Best wishes to all involved.

Friday, July 27, 2007

The largest, longest-lasting hack ever?

Hack, in the sense of "workaround", created by humans, elegant or not, and not restricted to computing.

My current contender is upper and lower case letters in English. It's like mixing two fonts in the same document. If scribes and early printers wanted ways to break up text, making different shapes for the same letters is an odd way to do it. Maybe space constraints were their real problem?

And what about double quotes? Someone came up with the idea of using a small mark before and after a word or phrase to set apart. Then someone else decided to just use the same marks twice. I'm surprised we haven't seen triple and quad quotes appear over the centuries.

What's your vote for the longest-living and largest hack in the history of humanity?

Wednesday, July 25, 2007

Useful "FishEye for Jira" plugin

Jira is easy to integrate with CVS and Subversion, but there's a FishEye for Jira plugin that can give you even better integration. If you are using FishEye (think ViewVC on steroids, plus change logs) as a front-end to your CVS or Subversion repositories, you should definitely investigate this plugin.

The screenshot below shows how the plugin replaces the Commit tab in Jira with a new FishEye tab.


Why is this a better approach?


The default mechanism used by Jira to connect issues to commits is by looking for issue identifiers in commit messages. For example, a commit with the message "Fixing TST-1234 again" is probably referring to the issue TST-1234. So far, so good.

The problem is that getting this information from CVS and tracking ongoing changes involves running "cvs log" and parsing the output. I recently came across one such file that was 90MB and took 10 minutes of churn to produce. Since the commit is only connected to the issue when this file is reparsed, there is often a sizable delay between making a commit and seeing it appear for the issue in Jira. Subversion integration with Jira is less of a load on the Subversion server, but still suffers from the same delay.

The FishEye plugin is much smarter than that. It uses the FishEye API to remotely query your FishEye instance each time that you refresh the FishEye tab of an issue inside Jira. FishEye has already indexed all the commits for its own purposes, so it can provide up-to-date results far faster than the out-of-the-box approach. Another benefit is that Jira no longer needs to load down the CVS server, or even contact the Subversion server.

Nitty Gritty

Installation is just like any other plugin, with some configuration in a .properties file. Remember to enable the API access in FishEye, and use the same names for repositories in the configuration file that you used when defining them in FishEye.

Wednesday, July 18, 2007

Priority Inflation

In most companies and projects, limited resources mean that as the ship date for a release approaches, only bugs with Priority 1 and 2 get fixed; the others are closed or deferred. Over time this practice leads to priority inflation. Someone entering a bug knows that this bug won't stop the product, but she remembers that none of her Priority 3 bugs got fixed last time and she really wants this one fixed, so she makes it a Priority 2. In the extreme, by a process of induction, all bugs become Priority 1 bugs and the purpose of the field is lost.

There's not much you can do about this except be aware of it happening and remind people what the priority fields are actually for.

The Lack of Difference between Priority, Severity, and Urgency

Chapter 7 of Practical Development Environment discusses tracking bugs. This article expands on ideas about bug priorities from there. The short version is "life is simpler if you rename all your priority fields to show who cares about one."

The strip is from Hans Bjordahl's very funny Bug Bash site.

Priority, Severity, Urgency, ... huh?

Most bugs have a field to indicate how serious the bug is. A common series of values goes something like this:
  • 1 - The bug stops the product, and no workaround is possible
  • 2 - The bug stops the product, but a workaround is possible
  • 3 - The bug breaks a minor part of the product
  • 4 - The bug is cosmetic or an irritation
To make this field useful, you have to make it really easy for users to find out what each value is supposed to mean, right when they are entering the bug. Some people expect higher values to mean that the bug is more important. Some bug trackers provide tiny little icons to confuse you still further because you can't remember what each icon means.

Where's that Thesaurus?

But there are more serious problems with this field in practice. The first problem is that English has a lot of words for some rather similar ideas. Quick, does "priority" mean the same as "importance"? What's the difference between "urgency" and "severity"? If English is not a user's first language, this makes using your bug tracker much harder work. Yes, the words do have specific meanings but if you have to stop and think about those meanings, then they're not good words to use. My suggestion is to pick one word and use it exclusively. I'm going to use "Priority" for the rest of this discussion, but you can choose your own favorite.

Priority for Who?

"But that's not right!"you mutter, "priority doesn't mean the same as urgency." Well, actually, I think it does, depending on who's priority we're referring to. For instance, the priority of a particular bug for the engineering team is a totally different thing from the priority of the very same bug for a Sales Engineer on site with a customer breathing down their neck. The different words usually mean priority for different groups of people. If they really do mean different things, then make it more obvious. For instance, use "Due Date" instead of "Urgency" to make the sense of time explicit.

My key suggestion is to have multiple priorities for a bug but name them per team. For example, the fields could be Development Priority, QA Priority, Support Priority, and so on. You can even have a "CFO Priority" for bugs that are stopping a contract being signed at the end of a quarter. That way, everyone gets to record which bugs really matter to them, and then they can work out which ones get worked on first. And as a side benefit, no-one gets offended when their favorite bug's priority is reduced to "Minor".

Of course, this approach means that the leaders of the teams involved in producing the product have to talk to each other regularly about what they actually want the teams to work on. Call me crazy, but that seems like a good idea to me.

Priorities that are set by Customers


One last thought: some companies allow customers to enter a priority when filing a bug. This usually becomes a "how irritated are you right now?" field. Which is useful data, but perhaps not what you originally expected to record in that field. However, when you change the value of the customer's Prioritity field, it's always a problem. If you increase the severity, the customer worries whether the problem is perhaps part of a bigger issue. If you decrease the severity, you appear to be minimizing his distress. If you provide this kind of field for customers, I suggest you allow them change the values themselves.

Thursday, July 12, 2007

Unit Testing and the New Testament

Taken out of context and not the intended meaning, but last Sunday it struck me that this reads like a quote from some article about unit testing:

"Each one should test his own actions. Then he can take pride in himself, without comparing himself to somebody else, for each one should carry his own load."
Galatians 6:4,5 (NIV)