1) Read the numbers from left to right.
2) Each number is added to the next...
3) Except when the next number is larger than the current number. Then you take the pair and do a subtraction.
So with this logic in mind, I came up with the following UDF. It assumes valid Roman numerals for input. But it seems to work ok.
2 var romans = {};
3 var result = 0;
4 var pos = 1;
5 var char = "";
6 var thisSum = "";
7 var nextchar = "";
8
9 romans["I"] = 1;
10 romans["V"] = 5;
11 romans["X"] = 10;
12 romans["L"] = 50;
13 romans["C"] = 100;
14 romans["D"] = 500;
15 romans["M"] = 1000;
16
17 while(pos lte len(input)) {
18 char = mid(input, pos, 1);
19 //are we NOT at the end?
20 if(pos != len(input)) {
21 //check my next character - if bigger, replace with a sub
22 nextchar = mid(input, pos+1, 1);
23 if(romans[char] < romans[nextchar]) {
24 thisSum = romans[nextchar] - romans[char];
25 result += thisSum;
26 pos+=2;
27 } else {
28 result += romans[char];
29 pos++;
30 }
31 } else {
32 result += romans[char];
33 pos++;
34 }
35 }
36
37 return result;
38 }
You can see how it follows the basic, 'left to right, add the numbers together' process, and how it notices when the current character has a higher number to the right of it. I wrote up a quick test script for it like so:
2 <cfloop index="input" list="#inputs#">
3 <cfoutput>
4 #input#=#romantodec(input)#<br/>
5 </cfoutput>
6 </cfloop>
Which produced:
XX=20
XI=11
IV=4
VIII=8
MC=1100
DL=550
XL=40
You can download this UDF at CFLib now: romanToDecimal
p.s. Sorry for those still waiting for UDF approval at CFLib. It is a volunteer process (myself, Scott Pinkston, Todd Sharp) so be patient!


Comment 1 written by Mark Drew on 2 February 2010, at 8:33 AM
#NumberFormat(1999, "roman")#
Which gives you:
MCMXCIX
Comment 2 written by Raymond Camden on 2 February 2010, at 8:34 AM
Comment 3 written by Mark Drew on 2 February 2010, at 8:49 AM
Comment 4 written by John Farrar on 2 February 2010, at 9:43 AM
Comment 5 written by Leigh on 2 February 2010, at 12:11 PM
Comment 6 written by Gary Funk on 2 February 2010, at 7:25 PM
Comment 7 written by Raymond Camden on 2 February 2010, at 10:53 PM
Comment 8 written by Raymond Camden on 3 February 2010, at 8:47 AM
Comment 9 written by Gary Funk on 3 February 2010, at 8:55 AM
Comment 10 written by Raymond Camden on 3 February 2010, at 8:57 AM
We could write a rule that loops for IIN and simply replaces it with Val(N)-2.
Comment 11 written by Raymond Camden on 3 February 2010, at 8:59 AM
Now I'm going to ask you to put up or shut up! ;) If you can find me proof that IIX (or IIC, etc) is valid, I'll support it. ;)
Comment 12 written by Gary Funk on 3 February 2010, at 9:15 AM
IIC is not even a valid Roman numeral (because you can't subtract 2 directly from 100; you would need to write it as XCIIX, for 10 less than 100, then 2 less than 10).
Also...
This form of notation closely follows Latin language usage, in which the number 18 is pronounced as duodeviginti, meaning two [deducted] from twenty (duo-de-viginti), and 19 is pronounced undeviginti, meaning one [deducted] from twenty (un-de-viginti).
So, if you can have 2 from 20, IIXX would be valid and come up wirth 18.
On a last note, it is clear that the rules are not really rules and have been changed over the last 2000 years. If IIX is not valid, at least, it shoud not retuen 10.
Comment 13 written by Andreas Schuldhaus on 3 February 2010, at 9:15 AM
Good post BTW. Thank you for sharing.
Comment 14 written by Gary Funk on 3 February 2010, at 3:12 PM
http://en.wikipedia.org/wiki/Roman_numerals
Comment 15 written by Gary Funk on 3 February 2010, at 3:16 PM
Comment 16 written by Raymond Camden on 3 February 2010, at 3:18 PM
If you can come up with a mod to the UDF to make it support XXY where X < Y, then I'll put it in. Otherwise, I can live with it. ;)
Comment 17 written by Jeff on 3 February 2010, at 4:14 PM
Comment 18 written by Gary Funk on 3 February 2010, at 4:48 PM
Comment 19 written by Andreas Schuldhaus on 4 February 2010, at 7:52 AM
@Raymond +1 - seems as if all the converters out there use a similar approach.
Comment 20 written by Daniel Harvey on 4 February 2010, at 8:51 AM
if((pos + 2) < len(input) ){
nextchar2 = mid(input, pos+2, 1);
} else {//set nextchar2 to one will not allow anything to be smaller than it.
nextchar2 = 'I';
}
if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
result +=thisSum
pos+=2;
}else if(romans[char] < romans[nextchar]) {
thisSum = romans[nextchar] - romans[char];
result += thisSum;
pos+=2;
} else {
result += romans[char];
pos++;
}
Comment 21 written by Daniel Harvey on 4 February 2010, at 8:52 AM
Comment 22 written by Daniel Harvey on 4 February 2010, at 8:54 AM
if((pos + 2) < len(input) ){
nextchar2 = mid(input, pos+2, 1);
} else {//set nextchar2 to one will not allow anything to be smaller than it.
nextchar2 = 'I';
}
if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
result +=thisSum
pos+=3;
}else if(romans[char] < romans[nextchar]) {
thisSum = romans[nextchar] - romans[char];
result += thisSum;
pos+=2;
} else {
result += romans[char];
pos++;
}
Comment 23 written by Gary Funk on 4 February 2010, at 10:44 AM
Either way, if IIX is not valid, it certainly should not return 10. It should return INVALID.
Comment 24 written by Raymond Camden on 4 February 2010, at 10:46 AM
Comment 25 written by Andreas Schuldhaus on 4 February 2010, at 10:58 AM
Comment 26 written by Gary Funk on 4 February 2010, at 11:07 AM
Comment 27 written by Ben Nadel on 4 February 2010, at 1:00 PM
Comment 28 written by Raymond Camden on 4 February 2010, at 1:27 PM
Comment 29 written by Ben Nadel on 4 February 2010, at 1:34 PM
This post makes me want to play with a very simple tokenizer. I don't know why, this is just really an interesting problem. Take the "comment" tag as an example. It is only meaningful in the following combination:
<!---
This means the parser has to read in 5 characters to build it... but it can't (say its an HTML comment, not a CFML one), then suddenly, it has to take the 4 preceding characters and treat them as individual tokens.
Maybe this is only interesting to me :)
Comment 30 written by John Farrar on 4 February 2010, at 2:09 PM
Comment 31 written by Gary Funk on 4 February 2010, at 5:46 PM
http://www.jacfb.com/index.cfm/2010/2/4/Translatin...
Hmm, it keeps telling me my comment is spam.
Comment 32 written by Raymond Camden on 4 February 2010, at 6:12 PM
That being said - your mod looks perfect! It works. But my ego forbids me from truly accepting that so I'm going to delete your comment and remove your BlogCFC from the Internet. Thanks for playing!
(No, instead, I'm going to update the CFLib version. Thanks!)
Comment 33 written by Gary Funk on 4 February 2010, at 7:06 PM
Comment 34 written by Charles on 25 February 2010, at 8:10 AM
Invalid CFML construct found on line 33 at column 22.
ColdFusion was looking at the following text:
{
The CFML compiler was processing:
* a script statement beginning with "var" on line 33, column 9.
* a script statement beginning with "function" on line 32, column 1.
* a cfscript tag beginning on line 22, column 2.
The error occurred in D:\Inetpub\serv\roman.cfm: line 33
31 : */
32 : function romantodec(input) {
33 : var romans = {};
34 : var result = 0;
35 : var pos = 1;
Comment 35 written by Raymond Camden on 25 February 2010, at 10:21 AM
[Add Comment] [Subscribe to Comments]