Friday Challenge - Compare Directories
It's been a while since I've done a Friday Challenge. Frankly I just haven't felt very creative. I think everyone goes through cycles of creativity and - um - the opposite of creativity (see, my vocabulary is suffering!) A reader, Kris (very Christmasy!) sent in the following idea, and I think it's pretty good.
As a reminder - you should spend less than 10 minutes working on this. Don't go crazy unless you have a real understanding boss. Your challenge today is to write a UDF or Custom Tag that takes 2 directories as parameters. The tag will return a list of:
- What files exist in folder A, but not B
- What files exist in folder B, but not A
- What files exist in both, but APPEAR different (size, date)
If you want to go crazy and make it recursive, that is fine, but again, there is no need, this is just for fun. (Although honestly, this could be quite useful!)
Comments
<cfexit method="exittag">
Instead of wrapping your entire tag in a CFIF. Not only is it less code, I think whole pages wrapped in one CFIF are bad form.
* instead of looping in the QoQ I would have done a NOT IN and valuelisted the column with a queryparam of list=true.
Here we go...
<cffunction name="comparedirs" access="public" returntype="struct">
<cfargument name="firstdir" type="string" required="yes">
<cfargument name="seconddir" type="string" required="yes">
<cfset returnvar = structNew() />
<cfif not directoryExists(arguments.firstdir)>
<cfthrow message="'Hey, this directory #arguments.firstdir# doesn't exist!'" />
</cfif>
<cfif not directoryExists(arguments.seconddir)>
<cfthrow message="'Hey, this directory #arguments.seconddir# doesn't exist!." />
</cfif>
<cfdirectory name="myfisrtdir" action="list" recurse="true" directory="#arguments.firstdir#" />
<cfdirectory name="myseconddir" action="list" recurse="true" directory="#arguments.seconddir#" />
<cfset firstdirnameslist = "" />
<cfset seconddirnameslist = "" />
<!--- Setup the name list for both dirs --->
<cfoutput query="myfisrtdir">
<cfset firstdirnameslist = listappend(firstdirnameslist,name)>
</cfoutput>
<cfoutput query="myseconddir">
<cfset seconddirnameslist = listappend(seconddirnameslist,name)>
</cfoutput>
<!--- Find all the unique names and put them in a list to return --->
<cfset returnvar.uniqueTofirstdir = "" />
<cfoutput query="myfisrtdir">
<cfif listfindnocase(seconddirnameslist,name) is "No">
<cfset returnvar.uniqueTofirstdir = listAppend(returnvar.uniqueTofirstdir, name) />
</cfif>
</cfoutput>
<cfset returnvar.uniqueToseconddir = "" />
<cfoutput query="myseconddir">
<cfif listfindnocase(firstdirnameslist,name) is "No">
<cfset returnvar.uniqueToseconddir = listAppend(returnvar.uniqueToseconddir, name) />
</cfif>
</cfoutput>
<!--- Find all the files that almost match, but not quite and put them in a list to return --->
<cfset returnvar.almostmatching = "" />
<cfoutput query="myfisrtdir">
<cfif listfindnocase(seconddirnameslist,name)>
<cfquery name="getseconddirinfo" dbtype="query">
SELECT dateLastModified, size FROM myseconddir
WHERE Name = '#Name#'
</cfquery>
<!--- Compare the last modified and sizes of the file that existed in both dirs --->
<cfif (dateLastModified NEQ getseconddirinfo.dateLastModified) OR (size NEQ getseconddirinfo.size)>
<cfset returnvar.almostmatching = listAppend(returnvar.almostmatching, name) />
</cfif>
</cfif>
</cfoutput>
<cfreturn returnvar />
</cffunction>
<cfset result = comparedirs(#ExpandPath( './A/' )#, #ExpandPath( './B/' )#)>
<cfdump var="#result#">
My only concern with the CFQueryParam approach is that I *have* run into problems where I max out the number parameter bindings that a query can have :) That has only happened on direct SQL queries, so it might not pertain to query of queries, but I think anything over 3,000 bindings crashes the request (I have some NOT so well though out approaches!!)
Agreed. If for no other reason (of which there are plenty), it would take my Explorer too long to load the list.
Where I have run into the upper limit on param binding is when using massive ID lists. Sometimes I try to lump too much stuff into a single query.
<cffunction name="DirDiff" returntype="query">
<cfargument name="L" type="string" required="true">
<cfargument name="R" type="string" required="true">
<cfset var Result=QueryNew("Name,Side")>
<cfset var LQ="">
<cfset var RQ="">
<cfif DirectoryExists(Arguments.L) AND DirectoryExists(Arguments.R)>
<cfdirectory name="LQ" directory="#Arguments.L#" action="LIST">
<cfdirectory name="RQ" directory="#Arguments.R#" action="LIST">
<cfquery dbtype="query" name="Result">
SELECT LQ.Name AS Name, 'LEFT' AS Side FROM LQ
UNION ALL
SELECT RQ.Name AS Name, 'RIGHT' AS Side FROM RQ
UNION ALL
SELECT LQ.Name AS Name, 'BOTH' AS Side FROM LQ, RQ WHERE (LQ.Name = RQ.Name)
</cfquery>
<cfquery dbtype="query" name="Result">
SELECT Name AS Name, MIN(Side) AS Side FROM Result GROUP BY Name ORDER BY Name
</cfquery>
</cfif>
<cfreturn Result>
</cffunction>
SELECT LQ.Name AS Name, 'LEFT' AS Side FROM LQ
UNION ALL
SELECT RQ.Name AS Name, 'RIGHT' AS Side FROM RQ
UNION ALL
SELECT LQ.Name AS Name, 'BOTH' AS Side FROM LQ, RQ WHERE (LQ.Name = RQ.Name) AND (LQ.Size = RQ.Size) AND (LQ.DateLastModified = RQ.DateLastModified)
UNION ALL
SELECT LQ.Name AS Name, 'DIFFERENT' AS Side FROM LQ, RQ WHERE (LQ.Name = RQ.Name)
It's totally cheating, I know. But I wonder how it scales in comparison with the Struct-indexed method ... ?
RickO and Ben's are the only ones that don't fail badly when there's a comma in the file name too.
With some performance testing, Joe's method vs RickO's I get the following averages:
Joe: 177.565ms, 317.96ms, 155.65ms
Joe (UDF): 155.5ms, 407.205ms, 100.025
RickO: 104.63ms, 287.545ms, 71.435ms
So I'd say it scales very well RickO! Also of importance is that custom tags aren't that much slower than functions.
(Test was done on a Powerbook G4 1.33Ghz, 1.25GB of RAM, 4200RPM HD, CF8 Java 1.5, Dir1 Size: 140, Dir2 Size: 12, Executions: 200, Runs: 3)
http://www.mediafire.com/?6yttt3x1kdt
I look forward to comparing it to the posts above.

Here's what I've got:
<cfif thisTag.executionMode eq "start">
<cfparam name="attributes.name" type="string" default="directoryCompare" />
<cfparam name="attributes.directoryOne" type="string" />
<cfparam name="attributes.directoryTwo" type="string">
<cfset result = structNew() />
<cfif not directoryExists(attributes.directoryOne)>
<cfthrow message="'#attributes.directoryOne#' doesn't exist." />
</cfif>
<cfif not directoryExists(attributes.directoryTwo)>
<cfthrow message="'#attributes.directoryTwo#' doesn't exist." />
</cfif>
<cfdirectory name="filesOne" action="list" recurse="true" directory="#attributes.directoryOne#" />
<cfdirectory name="filesTwo" action="list" recurse="true" directory="#attributes.directoryTwo#" />
<!---
Ray's a mean bastard. It looks like you can't use subqueries in the WHERE clause of a QoQ,
so my initial idea of just using SQL is bunk.
So I says to myself: create a simple list and do an "IN".
Doesn't help much with size/date comparison, though.
Final answer = map keyed by relative path.
--->
<cfloop list="One,Two" index="i">
<cfloop query="files#i#">
<cfif variables["files" & i].type eq "File">
<cfif i eq "One">
<cfset origDir = attributes.directoryOne />
<cfelse>
<cfset origDir = attributes.directoryTwo />
</cfif>
<cfset key = right(variables["files" & i].directory & "/", len(variables["files" & i].directory) - len(origDir) + 1) & variables["files" & i].name />
<cfset variables["fileMap" & i][key] = structNew() />
<cfset variables["fileMap" & i][key].dateLastModified = variables["files" & i].dateLastModified />
<cfset variables["fileMap" & i][key].size = variables["files" & i].size />
</cfif>
</cfloop>
</cfloop>
<!--- Build unique list of files. --->
<cfset result.uniqueInFirstDirectory = "" />
<cfloop collection="#fileMapOne#" item="i">
<cfif not structKeyExists(fileMapTwo, i)>
<cfset result.uniqueInFirstDirectory = listAppend(result.uniqueInFirstDirectory, i) />
</cfif>
</cfloop>
<cfset result.uniqueInSecondDirectory = "" />
<cfloop collection="#fileMapTwo#" item="i">
<cfif not structKeyExists(fileMapOne, i)>
<cfset result.uniqueInSecondDirectory = listAppend(result.uniqueInSecondDirectory, i) />
</cfif>
</cfloop>
<cfset similarFileMap = structNew() >
<cfloop collection="#fileMapOne#" item="i">
<cfif structKeyExists(fileMapTwo, i)
and (
fileMapTwo[i].size neq fileMapOne[i].size
or fileMapTwo[i].dateLastModified neq fileMapOne[i].dateLastModified
)
and not structKeyExists(similarFileMap, i)
>
<cfset similarFileMap[i] = i />
</cfif>
</cfloop>
<cfset result.similarFiles = structKeyList(similarFileMap) />
<cfset caller[attributes.name] = result />
</cfif>