Tuesday, June 19, 2007

Word Doesn't Play Nicely With Code

Don't cut and paste from Word or Outlook into Eclipse, Jira or any non-Windows application. Or if you do, then at least disable the smart-quotes stuff in Word How to disable smart quotes in word:
  • Tools, AutoCorrect Options, AutoCorrect - uncheck all correct and capitalize boxes
  • Tools, AutoCorrect Options, AutoFormat As You Type - uncheck all correct and capitalize boxes
  • Tools, AutoCorrect Options, AutoFormat - uncheck all correct and capitalize boxes

Windows applications use a character set that is not UTF-8, which is what is used by most browsers, editors, etc. This is usually not a problem since the Windows character set is mostly compatible, but if you have smart-anything enabled inWord (smart-quotes, auto-format etc) then character such as double quotes and hyphens get quietly converted into characters that look nicer on the screen but don't work in source code.


The following pipeline will tell you which Java files contain non-ASCII characters. These might be valid instances of non-ASCII characters, but they ought to be checked.

[mdoar@toolsdev Projects]$ find SystemServices/ -name "*.java" | xargs perl -nwe 'print "$ARGV\n" if /[^[:ascii:]]/'  | sort | uniq -c | sort -rn | more
19 SystemServices/src/java/com/packetmotion/sysservice/exception/SystemException.java
...

To see the line numbers in each file:

[mdoar@toolsdev Projects]$ find SystemServices/ -name "*.java" | xargs perl -nwe 'print "$ARGV:$.\n" if /[^[:ascii:]]/'

No comments: