Is there a way to decode html special entities like & and é to normal text? I need to clean a MSSQL database field of nvarchar.
1
Created on: 06/24/08 11:29 AM
Replies: 6
reincat
Joined: 06/24/08
Posts: 2
marcovandenoever
Joined: 02/20/07
Posts: 82
RE: decode html special entities like & to normal text?
06/24/08 4:31 PM
I think this script that uses the find replace function will do:
<cfscript> /** * Fixes text using Microsoft Latin-1 "Extentions", namely ASCII characters 128-160. * * @param text Text to be modified. (Required) * @return Returns a string. * @author Shawn Porter (sporter@rit.net) * @version 1, June 16, 2004 */ function DeMoronize (text) { var i = 0; // map incompatible non-ISO characters into plausible // substitutes text = Replace(text, Chr(128), "€", "All"); text = Replace(text, Chr(130), ",", "All"); text = Replace(text, Chr(131), "<em>f</em>", "All"); text = Replace(text, Chr(132), ",,", "All"); text = Replace(text, Chr(133), "...", "All"); text = Replace(text, Chr(136), "^", "All"); text = Replace(text, Chr(139), ")", "All"); text = Replace(text, Chr(140), "Oe", "All"); text = Replace(text, Chr(145), "`", "All"); text = Replace(text, Chr(146), "'", "All"); text = Replace(text, Chr(147), """", "All"); text = Replace(text, Chr(148), """", "All"); text = Replace(text, Chr(149), "*", "All"); text = Replace(text, Chr(150), "-", "All"); text = Replace(text, Chr(151), "--", "All"); text = Replace(text, Chr(152), "~", "All"); text = Replace(text, Chr(153), "™", "All"); text = Replace(text, Chr(155), ")", "All"); text = Replace(text, Chr(156), "oe", "All"); // remove any remaining ASCII 128-159 characters for (i = 128; i LTE 159; i = i + 1) text = Replace(text, Chr(i), "", "All"); // map Latin-1 supplemental characters into // their &name; encoded substitutes text = Replace(text, Chr(160), " ", "All"); text = Replace(text, Chr(163), "£", "All"); text = Replace(text, Chr(169), "©", "All"); text = Replace(text, Chr(176), "°", "All"); // encode ASCII 160-255 using ϧ format for (i = 160; i LTE 255; i = i + 1) text = REReplace(text, "(#Chr(i)#)", "##i#;", "All"); // supply missing semicolon at end of numeric entities text = ReReplace(text, "#([0-2][[:digit:]]{2})([^;])", "#\1;\2", "All"); // fix obscure numeric rendering of < > & text = ReReplace(text, "#038;", "&", "All"); text = ReReplace(text, "#060;", "<", "All"); text = ReReplace(text, "#062;", ">", "All"); // supply missing semicolon at the end of & " text = ReReplace(text, "&(^;)", "&\1", "All"); text = ReReplace(text, ""(^;)", ""\1", "All"); text = ReReplace(text, "<BR>","", "all"); return text; } </cfscript>
Bradley
Joined: 05/12/08
Posts: 90
RE: decode html special entities like & to normal text?
06/25/08 2:09 PM
Uhhhh... hey guys...
Instead of that big huge script file, you could just enclose the special entities like so...
<cfoutput>
#ToString("&")#<br>
#ToString("é")#
</cfoutput>
...or am I completely wrong and this isn't what you're looking for?
marcovandenoever
Joined: 02/20/07
Posts: 82
Bradley
Joined: 05/12/08
Posts: 90
reincat
Joined: 06/24/08
Posts: 2
RE: decode html special entities like & to normal text?
06/26/08 2:50 AM
Thanks for your suggestions. PHP has a html_decode but in CF you have to make your own. I found this one on the net and modified it (don't remember where) and it did the job well. Also if you build a query and dump it, showing the before and after text, all the special characters show up in the browser instead of rendering, which is really helpful when debugging these characters.
<cffunction name="HtmlUnEditFormat" access="public" returntype="string" output="no" displayname="HtmlUnEditFormat" hint="Undo escaped characters">
<cfargument name="str" type="string" required="Yes" />
<cfscript>
var lEntities = "#xE7;,#xF4;,#xE2;,Î,Ç,È,Ó,Ê,&OElig,Â,«,»,À,É,≤,ý,χ,∑,′,ÿ,∼,β,⌈,ñ,ß,„,´,·,–,ς,®,†,⊕,õ,η,⌉,ó,,>,φ,∠,,α,∩,↓,υ,ℑ,³,ρ,é,¹,<,¢,¸,π,⊃,÷,ƒ,¿,ê, ,∅,∀, ,γ,¡,ø,¬,à,ð,ℵ,º,ψ,⊗,δ,ö,°,≅,ª,‹,♣,â,ò,ï,♦,æ,∧,◊,è,¾,&,⊄,ν,“,∈,ç,ˆ,©,á,§,—,ë,κ,∉,⌊,≥,ì,↔,∗,ô,∞,¦,∫,¯,½,¤,≈,λ,⁄,‘,…,œ,£,♥,−,ã,ε,∇,∃,ä,μ,¼, ,≡,•,←,«,‾,∨,€,µ,≠,∪,å,ι,í,⊥,¶,→,»,û,ο,‚,ϑ,∋,∂,”,℘,‰,²,σ,⋅,š,¥,ξ,±,ℜ,þ,〉,ù,√,,∴,↑,×, ,θ,⌋,⊂,⊇,ü,’,ζ,™,î,ϖ,,〈,˜,ú,¨,∝,ϒ,ω,↵,τ,⊆,›,∏,",,♠";
var lEntitiesChars = "ç,ô,â,Î,Ç,È,Ó,Ê,Œ,Â,«,»,À,É,?,ý,?,?,?,Ÿ,?,?,?,ñ,ß,„,´,·,–,?,®,‡,?,õ,?,?,ó,,>,?,?,?,?,?,?,?,?,³,?,é,¹,<,¢,¸,?,?,÷,ƒ,¿,ê,?,?,?,?,?,¡,ø,¬,à,ð,?,º,?,?,?,ö,°,?,ª,‹,?,â,ò,ï,?,æ,?,?,è,¾,&,?,?,“,?,ç,ˆ,©,á,§,—,ë,?,?,?,?,ì,?,?,ô,?,¦,?,¯,½,¤,?,?,?,‘,…,œ,£,?,?,ã,?,?,?,ä,?,¼, ,?,•,?,«,?,?,€,µ,?,?,å,?,í,?,¶,?,»,û,?,‚,?,?,?,”,?,‰,²,?,?,š,¥,?,±,?,þ,?,ù,?,?,?,?,×,?,?,?,?,?,ü,’,?,™,î,?,?,?,˜,ú,¨,?,?,?,?,?,?,›,?,"",?,?";
</cfscript>
<cfreturn ReplaceList(arguments.str, lEntities, lEntitiesChars) />
</cffunction>
marcovandenoever
Joined: 02/20/07
Posts: 82
1
New Post
Please login to post a response.