Sitemap Generator
Earlier today Yahoo and Google announced their collaboration on Sitemaps.org. Sitemaps provide a way to describe to a search engine what pages make up your web site. I've had sitemap support in BlogCFC for a while, but today I wrote a little UDF you can use to generate sitemap xml. It will take either a list of URLs or a query of URLs. Enjoy. I'll post it to CFLib later in the week.
<cffunction name="generateSiteMap" output="false" returnType="xml">
<cfargument name="data" type="any" required="true">
<cfargument name="lastmod" type="date" required="false">
<cfargument name="changefreq" type="string" required="false">
<cfargument name="priority" type="numeric" required="false">
<cfset var header = "<?xml version=""1.0"" encoding=""UTF-8""?><urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">">
<cfset var result = header>
<cfset var aurl = "">
<cfset var item = "">
<cfset var validChangeFreq = "always,hourly,daily,weekly,monthly,yearly,never">
<cfset var newDate = "">
<cfset var tz = getTimeZoneInfo().utcHourOffset>
<cfif structKeyExists(arguments, "changefreq") and not listFindNoCase(validChangeFreq, arguments.changefreq)>
<cfthrow message="Invalid changefreq (#arguments.changefreq#) passed. Valid values are #validChangeFreq#">
</cfif>
<cfif structKeyExists(arguments, "priority") and (arguments.priority lt 0 or arguments.priority gt 1)>
<cfthrow message="Invalid priority (#arguments.priority#) passed. Must be between 0.0 and 1.0">
</cfif>
<!--- reformat datetime as w3c datetime / http://www.w3.org/TR/NOTE-datetime --->
<cfif structKeyExists(arguments, "lastmod")>
<cfset newDate = dateFormat(arguments.lastmod, "YYYY-MM-DD") & "T" & timeFormat(arguments.lastmod, "HH:mm")>
<cfif tz gte 0>
<cfset newDate = newDate & "-" & tz & ":00">
<cfelse>
<cfset newDate = newDate & "+" & tz & ":00">
</cfif>
</cfif>
<!--- Support either a query or list of URLs --->
<cfif isSimpleValue(arguments.data)>
<cfloop index="aurl" list="#arguments.data#">
<cfsavecontent variable="item">
<cfoutput>
<url>
<loc>#xmlFormat(aurl)#</loc>
<cfif structKeyExists(arguments,"lastmod")>
<lastmod>#newDate#</lastmod>
</cfif>
<cfif structKeyExists(arguments,"changefreq")>
<changefreq>#arguments.changefreq#</changefreq>
</cfif>
<cfif structKeyExists(arguments,"priority")>
<priority>#arguments.priority#</priority>
</cfif>
</url>
</cfoutput>
</cfsavecontent>
<cfset item = trim(item)>
<cfset result = result & item>
</cfloop>
<cfelseif isQuery(arguments.data)>
<cfloop query="arguments.data">
<cfsavecontent variable="item">
<cfoutput>
<url>
<loc>#xmlFormat(url)#</loc>
<cfif listFindNoCase(arguments.data.columnlist,"lastmod")>
<cfset newDate = dateFormat(lastmod, "YYYY-MM-DD") & "T" & timeFormat(lastmod, "HH:mm")>
<cfif tz gte 0>
<cfset newDate = newDate & "-" & tz & ":00">
<cfelse>
<cfset newDate = newDate & "+" & tz & ":00">
</cfif>
<lastmod>#newDate#</lastmod>
</cfif>
<cfif listFindNoCase(arguments.data.columnlist,"changefreq")>
<changefreq>#changefreq#</changefreq>
</cfif>
<cfif listFindNoCase(arguments.data.columnlist,"priority")>
<priority>#priority#</priority>
</cfif>
</url>
</cfoutput>
</cfsavecontent>
<cfset item = trim(item)>
<cfset result = result & item>
</cfloop>
</cfif>
<cfset result = result & "</urlset>">
<cfreturn result>
</cffunction>
Comments
http://www.cflib.org/udf.cfm?id=1596
Code changes occur after the comment "reformat datetime as w3c datetime / http://www.w3.org/TR/NOTE-datetime".
1. Change the test of tz to be "gt" rather than "gte". To be honest this is really just a personal style thing, +00:00 looks better than -00:00 to me, and doesn't seem to effect Google.
2. Make the hour number format "00" for the newDate offset eg. numberFormat(tz,"00"). So the lines should read newDate = newDate & "-" & numberFormat(tz,"00") & ":00" and newDate = newDate & "+" & numberFormat(tz,"00") & ":00"
HTH
I am having issues with cfdirectory w/recursion at webroot. I get the pesky null pointer error, which I am attributing to archived directories, etc bloating the query. I still need to prove that is the cause.
Is it supposed to generate a sitemap.xml file ?
And if so, how? I can't find any option to do this.
Or do I need to update as I am on blogCFC 5.5
<cfset siteMapXML = generateSiteMap(data=urls,changefreq="daily",priority="1.0", lastmod=now())>
<cfdump var="#xmlParse(siteMapXML)#">
<cfset siteMapXML = generateSiteMap(qurls)>
<cfdump var="#xmlParse(siteMapXML)#">
I want these combined as a need to put it all to one xml sitemap, the .cfm sitemap takes to long to load, big sitemap.
thanks

