Thursday, February 14, 2008

Evaluating JIRA Multisite


Given the number of organizations already using JIRA across a WAN, there is plenty of interest in finding ways to improve the experience. I've had a few clients suggest using distributed databases, changing HTTP caching behaviour or not using HTTPS. None of these are really great fixes, and are complicated by the fact that JIRA keeps much of its data in a local Lucene index outside the database for performance reasons.

So when WANdisco announced a beta of JIRA Multisite last November in partnership with Atlassian, I was interested to see what it would do. It's billed as a high availability solution and in doing that it gives you local JIRA servers with all your data nicely synchronized. There is another approach that was announced at about the same time, the JIRA clustering solution Scarlet. I haven't evaluated Scarlet yet but it appears to have a single point of failure by default.

I contacted WANdisco to ask for an evaluation copy and they were happy to help. They have an existing replication tool for CVS and Subversion that they've connected to JIRA. You need their tool and their instance of JIRA. As an aside, though they are keeping up with each release of JIRA, I'd rather have instructions about how to modify my existing instance of JIRA to work with their tool, but I'll take what I can get for now.

To provide high availability you have to have 3 or more instances of JIRA, but since I was mainly interested in how each sites' performance changed, I just set up two instances of JIRA, one in San Jose, CA and the other in Bangalore, India. The connection between the two sites is a clogged T1 at best and the team in India often have sluggish response times from JIRA.

Setup Experience

WANdisco wanted to set the tool up, but I did it myself in an hour for the two nodes. Instructions were beta quality, but not bad. After that piece of stubbornness, their tech founder worked out what I had done wrong in about an hour, and then together we had it all working in another hour. Three hours from scratch is pretty good as these things go.

Testing

I modified a bug in San Jose and watched the change appear in India a second or two later. Then I modified a bug in India and saw the change locally in about the same time. Just as expected. Then we stopped one of the JIRA servers, made some changes, waited a bit, restarted the server and saw the changes all get synchronized. Other users updated issues over the next month and the changes appeared just as expected. The big win was that the users in India saw their local response improve dramatically. The underlying WANdisco replication tool was rock solid for the month's evaluation.


Restrictions


The version I tested didn't synchronize attachments, but that has been added since then. You do have to use the same OS (and database I believe) for all the instances of JIRA. This was not a problem for me, but if you have a Windows server in one location and Linux in another, it won't work.

I didn't try https, but I did set up LDAP authentication and that worked as expected

I'm pretty sure that if I wanted to go back to one instance of JIRA I could have exported the data and then reimported it into a non-multisite instance of JIRA.

Cost

Pricing is public and is US $7500 per instance of JIRA, which is about 50% more than the current Enterprise license cost. This seems about right given the cost of the tool and the target customers. Support comes from WANdisco and JIRA, in that order.

Summary

JIRA Multisite is still in its early stages, but it is very promising. It worked well for me with little effort, and provides good value for the price.

11 comments:

Anonymous said...

I agree promising... however I think it's essential that the tool is able to be user-installed. What this means is that they need to improve the installation procedure, and the setup documentation.

I set this up on different OSes and failed to get it working, but I think Wandisco said it was because I was using different versions of the java runtime... I assumed they were using the serialization api, although different versions isn't necessarily going to break it.

If you must have homegenous OSes, to what degree? Eg, Solaris 9 and 10? Do they have to be at the same patch level? I didn't think the databases had to match, as everything still goes through the OfBiz layer.

What I'm curious about with the multisite solution, is, whilst i can see it speeding up read operations (browsing, viewing issues etc), surely it's going to slow down writes for everyone. When you want to modify a record it contacts the other site, gets some kind of lock, then lets the write go ahead. So there's going to be a couple of round trips from the US to India before every write. Did you notice that?

I was looking at the clustering solution which seems to me to be basically the same thing. What I don't get is that they want to sell you a pack of three licences, because they say that gives you mathematical certainty of availability. Now, I only have a post-graduate qualification in statistics so I may be missing something here. Surely 2 nodes gives you better security than one node, and obviously 3 is better than 2, but so is 4 better than 3. To buy 3 nodes will cost around 5 times that of jira enterprise.

Anyway, I couldn't get Scarlet working either.

jamie

Anonymous said...

oh - if it doesn't support heterogenous databases it's a showstopper for me. I want to run it with 2 nodes using postgres, and an inactive 3rd node using sql server.

Traffic would be split across the 2 active nodes, performance would be good because the db would be colocated with jira. The third instance would be for disaster recovery and reporting purposes.

Anonymous said...

Hi Matt,

Thanks for nice article, but I still feel that it's not sufficient and the cost which is being charged is big with so many limitations.
I do agree that it works better than other things but still in infancy.
Why the database limitations??

Subversionman said...

Jamie,



Homogeneous JVMs are required because of WANdisco’s use of Java serializability, so this would have been an issue, as you pointed out.

Use of the same OS across all of the nodes is driven by the fact that JIRA stores attachments (which are also replicated) and indexes in the filesystem, rather than the underlying database, and different operating systems use different syntax for file pathnames. Since the replication process replays each write transaction on every other node, it requires the pathnames to be consistent in the current implementation. WANdisco may remove this limitation in the future, depending on customer requirements.

In any case, homogeneity in terms of operating systems across all of the nodes is preferred by the vast majority of WANdisco’s customers because it greatly simplifies support and administration in a replicated multi-site environment.

Mismatches in version numbers or patch levels for the same OS may not be a problem, but given all of the possible combinations, you could run into trouble, as you could in any replicated system.

It is possible to use different databases on each of the nodes, but this might pose some risks depending on the databases involved. For example, assume you are using two different databases, and one of them accepts a long string in an update, while the other rejects it (because the string is too long). Your databases are now out of sync. This could occur in any replication scenario, not just replicated JIRA.

In terms of your question about write transactions slowing down for everybody with JIRA MultiSite, there are two aspects to how WANdisco avoids this, and delivers LAN-speed performance for both read and write transactions at all sites. First of all, JIRA MultiSite can be configured to support a follow-the-sun approach that allows performance to be optimized so that WAN latency is completely eliminated for write transactions during each site’s normal business hours.

Secondly, even if the follow-the-sun approach is not used, the site from which the write transaction originates is only waiting for an acknowledgement to come back from the other site before executing the write locally; it’s not waiting for the write to complete on the other site. Between the US and India for example, the amount of time required for this acknowledgement process to complete is typically on the order of 400 milliseconds, assuming the throughput of the typical E-1 line used. Hence the user experience is essentially LAN-speed for writes as well as reads. See the WANdisco technical white paper http://www.wandisco.com/php/download_wpp.php for more details.


In regards to your comment about WANdisco’s recommendation that a JIRA Cluster consist of at least 3 nodes, it is possible to implement a 2 node JIRA Cluster. However, you may lose the benefit of JIRA Clustering’s automated recovery feature if one of the nodes in the 2 node cluster goes down, under certain scenarios.

In response to your comments regarding pricing, JIRA Clustering costs significantly less than alternative clustering solutions, or backup and disaster recovery solutions such as EMC SRDF. In addition, unlike EMC SRDF, recovery is automatic, and can be achieved over a LAN, or in the case of JIRA MultiSite, over a WAN, which EMC SRDF and other backup and recovery solutions don't support. Each node includes JIRA Enterprise and licensing is on a perpetual one-time payment basis. Support and maintenance is available at 18%. Given all of these factors, pricing has never been an issue for our customers, most of whom already had existing JIRA implementations.

Anonymous said...

Vish - we are currently using JIRA MultiSite between the US and China. I think your comment about price and limitations is harsh. Without it we couldn't really use JIRA from a scale and performance perspective. Also the products DO now support attachments. I think a small premium on JIRA is a small price to pay and the support is 18% not 50%!

Matt Doar said...

Subversionman,

You're from WANdisco, right?

~Matt

Subversionman said...

That's correct - I am the CTO of WANdisco. You can also see my blog at http://subversionee.blogspot.com/

Sergio Bossa said...

Hi guys,

I just stumbled on this post.
I'm the Scarlet project leader and I'd like to know what was wrong with it: it's a brand new product and every kind of feedback is very important.

Thanks,
Cheers,

Sergio B.

Sergio Bossa said...

Here is the correct Scarlet link: http://scarlet.sourceforge.net/ ... sorry for my typing mistake.

Matt Doar said...

Sergio,

There's nothing wrong with Scarlet that I know of. Please correct me if I am mistaken, but Scarlet is designed to spread the load of a large site across multiple JIRA servers, using one backend database. It isn't designed to have the JIRA instances separated by a WAN which is what was wanted here.

~Matt

Sergio Bossa said...

Matt,

thanks for your response.
You are right, Scarlet isn't meant to work on WAN networks: if it's a requirement, Scarlet doesn't surely fit your needs.

Cheers,

Sergio B.