Tracking data views accurately

A user posted an interesting question over on my forums that I thought I'd share with others. I answered there but I'd like to get other opinions as well. Here is the question:

I'm in the process of building a forum system as a means of trying out new things. Just came across an issue I hadn't really thought about with view tracking - what would you suggest is the best way to keep track of thread views so a user hitting refresh repeatedly doesn't artificially inflate the numbers?

presumably a DB table storing the thread id, user id (if recorded), IP address and timestamp of the view is a start - but I'm not sure how you would maintain that table to prevent it getting unwieldy, and i guess you'd only want to store the latest visit for any given thread for a user.

So what I recommended was based on the changes I made to BlogCFC recently. Originally, my "Views" column for blog entries would go up every time you viewed a blog entry. You could sit there and reload all day long. (By the way, that issue still applies here, my own blog is a bit behind the released BlogCFC.)

I added a simple session variable that created a structure of viewed pages. I used a structure since I wanted a simple way to store, and check, for pages you had viewed before.

When you view a page now, I check that structure, and if this is a new page for you, I log a view. If it isn't, I do not log a view.

This was a rather simple fix, and it isn't perfect. BlogCFC simply counts views. It doesn't log them. So I can't say that Entry X got more views on Monday then it did on Tuesday. Another problem - someone could block the session cookies and artificially inflate the views for an entry. But I'm not building the Pentagon here - so I think it was a reasonable solution.

So what have others done?

Comments

John Ramon's Gravatar I pretty much did what you did Ray my blog just counts the page views and hitting refresh will make the count go up. I really don't see the point in creating a blog that counts the number of times a page is uniquely visited on a blog, well mine anyway. We do have some apps that log views by session and IP the main reason is for the reporting section for the client. My blog isn't reporting to investors so I don't need to keep a true count.
# Posted By John Ramon | 3/13/07 9:35 AM
DK's Gravatar There's no end all be all solution really like you said. Unless every thread sits behind user authentication so you can run it by user id and track views I suppose.

I've seen/done apps a couple diff ways. One way was similar to yours Ray with a session based tracking. Obviously the user can just log back later and affect the count, but it suited the app it was used on at the time. I think thats the key to any solution, whatever best suits the app and not always the preferred method to use I guess.

The other way I've seen it done is a lookup table off a user security trace. I've worked on apps that had to log every action a logged in and non logged in user performed. The table contained the ancillary data required to track counts if you wanted to. I believe this particular one transcribed it into a separate lookup table for just the counts later on or something similar.

Third way I've seen was an app that had the count in the view items table as an int field (so in the forum thread table an additional column called count or w/e). It then performed the update to that count on the view based on logic in the page. This one I believe allowed for inflation and only checked the referrer so you couldn't just sit there and refresh normally.
# Posted By DK | 3/13/07 10:46 AM
Joshua Curtiss's Gravatar For a few internal apps where I work, I used the same "store the viewed pages in their session scope" approach. It seems most efficient for the desired effect: If someone comes back to it 5 times over the course of 20 minutes, or sits there and hits refresh, you don't want those to all count, but if they come back tomorrow, you'd want that to count. Session scope takes care of that perfectly.
# Posted By Joshua Curtiss | 3/13/07 10:50 AM
DK's Gravatar sorry to be wordy, but another comment I had was on unwieldy tables. IMHO unwieldy tables is more a reflection on your current skill level with your db server of choice. Thats not a knock on anyone, I think I fall extremely short in many areas of DB administration. I've felt similar worries about table size .... but then I found sites like go-gaia.com that would dwarf my estimates for the tables in question.

Go-Gaia is a roleplaying type community that uses a modified phpbb forum (I used to do php before CF) to drive their forums and games and some of their portals etc. They have something like over 9 million posts, 6 million users, and at any given time 20k to 80k users online. Their sites (before machine upgrades) would still cruise for the most part, and their admin posted his tuning tips on the phpbb forums (I think they are still stickied). (Side-note: his post garnered dozens of posts arguing his methods as is the nature of us tech folk heh). I think it puts things in perspective sometimes when you check his diagnostic query results he posted and how optimized his queries were. Good stuff.
# Posted By DK | 3/13/07 10:56 AM
Rick Root's Gravatar I recently added "Views" functionality to CFMBB.

Because content in forum threads is constantly changing... I actually use cookies for each thread that expire in X minutes. The default in CFMBB is 2 minutes... so if you look at a thread... and then you look at it again 2 minutes later, it counts as another view... but if you hit refresh a bunch of times, they don't count.

the exact code:

<cfif NOT isDefined("cookie.view#replace(threadid,"-","","ALL")#")>
   <cfset application.thread.updateViewCount(threadid)>
   <cfset request.thread.views = request.thread.views + 1>
   <!--- only count views every 2 minutes --->
   <cfcookie name="view#replace(threadid,"-","","ALL")#" expires="#createTimeSpan(0,0,2,0)#" value="1">
</cfif>

Now, it someone is reading a long thread for the first time, they might cause several views as they're paging through the messages in the thread... but that doesn't really bother me =)

In the case of a very active thread, you might get subscription updates almost immediately after you respond to a thread. CFMBB sends a link directly to the new message, and if fewer than 2 minutes have passed... it won't count as another view, even though it probably should. This kind of activity was not uncommon for "Game Day Threads" on the Carolina Hurricanes web site last year during the Stanley Cup Playoffs. A topic would get hundreds of new responses over the course of a 2-1/2 hour hockey game.

I guess it all depends on how you want to define a "view".. you want to avoid the refresh issue.. but I think you want to count additional views when a user comes back to see a topic again because of new content.
# Posted By Rick Root | 3/14/07 7:32 AM
Aegis's Gravatar Thank you all for the suggestions - you've given me much to think about! Initially I'm going to try two options - Ray's session variant, Rick's CFMBB setup to see how they stack up.

@DK, you're absolutely right - I've only ever used DBs as glorified spreadsheets unfortunately - I'm slowly improving on my knowledge on that side. When i get some more time, i'll try a table-based tracking system - i do prefer this idea, as it should make duplication of views a lot easier to prevent i think
# Posted By Aegis | 3/14/07 8:27 PM