Seeker code update

I updated Seeker a few minutes. This is my code that wraps Lucene functionality. If that sounds like a type of mouthwash to you - just think of Lucene as a search engine, much like Verity, except that Lucene is free and open source. It also runs just fine on OSX.

The updates I included in Seeker are just bug fixes, but pretty critical bug fixes. Later this week I hope to have the ColdFusion Administrator pages build in to make it even easier to use. I'll be mimicking the Verity admin UI (pretty much) but will also include a search tool (like my Verity one) that will let you search indexes directly from the administrator.

p.s. And while I have your attention - my work on wrapping SVNKit as a possible replacement for the front end SVN stuff for RIAForge is close to being done. I'll be releasing that code as well (most likely).

Comments

James Allen's Gravatar Fantastic! I hadn't heard of Lucene before and it comes at just the right time. I am working on a question and answer site which requires a fast and efficient search engine (with relevancy etc). I was set on Verity but I'm concerned about the search limits (you can only index a certain amount under the normal license can't you?). Lucene looks like it could be a good alternative.

So does it work in a very similar way and would you recommend it as a good Verity alternative.
# Posted By James Allen | 6/2/08 7:09 AM
Raymond Camden's Gravatar Verity does have a limit - 250k - which I think is pretty reasonable. Don't get me wrong - I love Verity - and I think people don't give it enough credit, nor thank Adobe enough for shipping a -very- expensive search server w/ the product for free.

Does it work in a similar way: Kinda. :) Like Verity, you have 2 main parts. Part one is creating and maintaining the index. Part 2 is the searching. I tried to make things very much like the Verity API in CF.

Would I recommend it? My code has had VERY little usage. I think about 2 people have used it. To me - that's a bit scary. But - we got to start someplace. ;)
# Posted By Raymond Camden | 6/2/08 7:12 AM
Gus's Gravatar While I think Lucene is a great open source solution, the lack of support for most common file formats is problematic.

There are ways of dealing with this, to a degree, but it sure is nice that Verity handles the document conversions out of the box.
# Posted By Gus | 6/2/08 10:29 AM
Chris Peters's Gravatar On your RIA Forge page, you say that Lucene is a good candidate for people who can't run Java. I'm guessing that you meant Verity? :)

This is definitely something that I will be looking into using, along with your FeedBurner CFC!
# Posted By Chris Peters | 6/2/08 10:29 AM
Raymond Camden's Gravatar @Chris - Oops, fixed. Thanks.

@Gus - There is another project at Apache that helps with this, but I haven't worked much with it. I built Seeker though so that it is easy to extend. Download it and look at how I built the readers. To add support for format X, you just add a CFC. Todd Sharp is going to share some PPT code with me soon.
# Posted By Raymond Camden | 6/2/08 10:36 AM
Sami Hoda's Gravatar Sweet. Thanks Ray!
# Posted By Sami Hoda | 6/2/08 1:56 PM
Dave Phipps's Gravatar We host our sites on OS X xserves running cf8 and so Verity is not an option. Ray's Seeker code has been a life saver for our query based searches. We have about 6 production sites using it!!

The latest updates are looking promising for file based searches which were not working properly in the previous version. I really look forward to seeing the cfadmin stuff and will continue to test the code. Adding some more file readers will be really useful although pdf's and htm files are covered and these are the most common ones we index.

Keep up the excellent work Ray, I seriously don't know how you find the time.
# Posted By Dave Phipps | 6/2/08 4:42 PM
James Allen's Gravatar Thanks Ray - I'm still going to consider verity but the limit is a little worrying as the site I'm working on has the potential to smash through those limits in time. Is the limit based on all collections or 'per' collection?

I will give Lucene a test anyway - I'm only interested in query based indexing so it looks ideal.
# Posted By James Allen | 6/3/08 4:54 AM
Raymond Camden's Gravatar The limit in Verity is per box, not per collection.

I've got a new release of Seeker coming out later today. It just adds the ability to search N fields (thanks to AJ Mercer) and cleans up the zip a bit.

I also need to look into index operations like update/delete. It's going to suck if you have to blow away your index for every update.
# Posted By Raymond Camden | 6/3/08 6:34 AM
jbuda's Gravatar i have just been playing with Seeker, and working on a mac provides a superb alternative to Verity.

I would like to know what file formats can currently be read and indexed?

Thanks
# Posted By jbuda | 10/23/08 4:02 AM
Raymond Camden's Gravatar PDF, DOC, txt, html. It will try to read any other file as well as text and attempt to get something out of it.
# Posted By Raymond Camden | 10/23/08 6:10 AM
jbuda's Gravatar brilliant, thanks.

Would it be possible to index metadata from images? Or would that be something that could be added?
# Posted By jbuda | 10/23/08 6:16 AM
Raymond Camden's Gravatar One of the things I'm proud of is how easy it is to add 'indexers' to Seeker. You basically just write the CFC. So if you were to write the CFC for gif, jpg, tiff, whatever, and you used CF8 image funcs to get the metadata, your job is basically done. If do you so and share it with me, I'd most likely add it to the core project.
# Posted By Raymond Camden | 10/23/08 6:19 AM
jbuda's Gravatar thanks Ray
# Posted By jbuda | 10/23/08 6:49 AM
jbuda's Gravatar im using the query index tag from Seeker, and was wondering if when i run different queries and save the resultant indexes. Do there overwrite the previous files in there... or are they appended?

I basically need to create an indexes for multiple tables
# Posted By jbuda | 10/23/08 9:38 AM
Raymond Camden's Gravatar Right now Seeker does not support adding, updating, or deleting stuff from an index. I've been meaning to do that for a while now but I haven't found the time. You would need to do it all at once for multiple tables. What I would recommend is using Query of Query to join the multiple record sets, and then index that query.
# Posted By Raymond Camden | 10/23/08 9:40 AM
jbuda's Gravatar thanks again ray.

I assume that would be true for file indexing too?
# Posted By jbuda | 10/23/08 9:44 AM
Raymond Camden's Gravatar Yep. A bit of a pain, I know. :) Luckily things index rather quickly (as far as I know). Adding add/edit/delete support for indexes is #1 on my list for Seeker. Now to find the time...
# Posted By Raymond Camden | 10/23/08 9:45 AM
jbuda's Gravatar its a life saver considering i dont have access to verity.

Thanks for creating it!
# Posted By jbuda | 10/23/08 9:53 AM