How does caching a query inside of a CFC work?

{ Posted By : Eric Cobb on September 25, 2009 }
4220 Views
Related Categories: Ramblings, General, CFML

I have some confusion on caching a query from inside a component, so this post is really just a call-out to the CFML community to help me figure it out. In a nutshell, here's my question: Can you cache a query if it's in the "var" scope of a CFC? Before you answer, let me elaborate a little to explain my thoughts on the subject.

Caching a query in CFML is a pretty simple process, just add the "cachedwithin" attribute to your cfquery tag and you're done. No big deal. Your query's result set will then be stored in the server's memory until either your cachedwithin timeout is reached, or the server reaches the maximum number of queries it can cache and drops yours from memory.

Using the "var" scope in a CFC is an equally simple process. Anything that is put into the "var" scope in a component function should be cleared from memory once the function is finished executing. (or somewhere thereabouts...the point is, it's not stored in memory for any length of time.) So, the following query should execute, return the results to whatever called the function, and then be cleared from memory.

<cffunction name="getEmp">
<cfset var Emp="">
<cfquery name="Emp" datasource="datasource">
SELECT Firstname, Lastname
FROM Employee
WHERE ID = 123
</cfquery>
<cfreturn Emp>
</cffunction>

So, what happens if I try to cache the exact same query? Will it really be cached or will it be cleared from memory once the function is finished executing?

<cffunction name="getEmp">
<cfset var Emp="">
<cfquery name="Emp" datasource="datasource" cachedwithin="#CreateTimeSpan(0, 6, 0, 0)#">
SELECT Firstname, Lastname
FROM Employee
WHERE ID = 123
</cfquery>
<cfreturn Emp>
</cffunction>

My gut feeling (although I have no facts to base this on) is that query results get cached in their own little section of "memory", and components have their own little section of "memory". So even though the function itself has been cleared from memory, the query result set is still cached somewhere else in memory. Then, when that function is called again there is something connecting the cached result set with the query called in the function (again, I'm assuming this). Does this make sense to anyone else?

Maybe part of my problem is that that the term "memory" seems to be used so ambiguously. This is stored in "memory", that is stored in "memory", this was cleared from "memory", etc..., but you never get a clear definition of where or how things are actually stored "in memory".

So, if my logic above is correct and query result sets are stored separately in "memory" from the innards of a component, what happens if you store the query results in the CFC's variables scope AND you cache the query?

<cffunction name="getEmp">
<cfquery name="variables.Emp" datasource="datasource" cachedwithin="#CreateTimeSpan(0, 6, 0, 0)#">
SELECT Firstname, Lastname
FROM Employee
WHERE ID = 123
</cfquery>
<cfreturn variables.Emp>
</cffunction>

Now the CFC has stored our result set in memory to be used throughout the CFC, and we've told the server to cache our result set. Is the result set stored twice in "memory"? If I call this exact same query in another CFC, or on another CFM page, will I get the cached results or will it actually run the query? What happens if I then put my CFC in the application scope, will it store my same result set in "memory" again?

...I think I've given myself a migraine...

Comments
Jose Galdamez's Gravatar I ran a couple of tests based on your questions, and here's what I can tell you.

Cached queries are global for the server and have nothing to do with the variable they were originally stored in. When the original query variable is garbage collected, the query recordset is still cached in the server's RAM. You can test this with cffunction or within a CFM page itself. If you run cfquery within a CFM page and you cache the query, that query variable (in the variables scope by default) is garbage collected when the page is done running. Yet, the query is still cached in RAM. Same goes for var scopes and this scopes.

As for the 101 aspect of this (which you probably already know), the query stored in RAM is accessed only when the exact same query (including the dynamic parts) is run again. For example, if the query is run several times with different primary keys then that's a separate cache for each query. So:

SELECT Firstname, Lastname FROM Employee WHERE ID = 1

Would be cached first.

SELECT Firstname, Lastname FROM Employee WHERE ID = 2

Would be cached separately.

The way I figured this out was by running a cfquery within a cffunction that stored the query results in a var scoped variable. The first time I ran it the query took 35ms to run. The second, third, and fourth times it took 0ms, showing that the query did not hit the database server but got its results directly from RAM. Long story short, variable scoping has no effect on query caching.

Now this whole thing about storing a query variable within the variables scope is a matter of preference and how you would prefer to get the best performance out of your app. If you cache a query to RAM and store it within the variables scope of a CFC then yes, it is now taking two separate spots in memory: one for the server cache and another for the CFC instance. I'm assuming you're instantiating the CFC to a variable using something like:

cfset myCFCvar = createObject('component','myCFC').init()

If that's the case, then this is what happens. The query is first cached to server RAM. Then it is stored to the variables scope of the CFC. When the instance of the CFC is garbage collected, the latter is now gone, but the server cache is still around. If your CFC is instantiated to a the variables scope of a page, then garbage collection happens when the page is done running. If you store the CFC instance to the Application scope, then the CFC instance is garbage collected when the application hits the end of its cycle.

Is this last approach necessarily bad? I can't say. It's redundant, yes, but you sure can bump up performance by not hitting the database so much.
# Posted By Jose Galdamez | 9/25/09 12:47 PM
Jose Galdamez's Gravatar I ran a couple of tests based on your questions, and here's what I can tell you.

Cached queries are global for the server and have nothing to do with the variable they were originally stored in. When the original query variable is garbage collected, the query recordset is still cached in the server's RAM. You can test this with cffunction or within a CFM page itself. If you run cfquery within a CFM page and you cache the query, that query variable (in the variables scope by default) is garbage collected when the page is done running. Yet, the query is still cached in RAM. Same goes for var scopes and this scopes.

As for the 101 aspect of this (which you probably already know), the query stored in RAM is accessed only when the exact same query (including the dynamic parts) is run again. For example, if the query is run several times with different primary keys then that's a separate cache for each query. So:

SELECT Firstname, Lastname FROM Employee WHERE ID = 1

Would be cached first.

SELECT Firstname, Lastname FROM Employee WHERE ID = 2

Would be cached separately.

The way I figured this out was by running a cfquery within a cffunction that stored the query results in a var scoped variable. The first time I ran it the query took 35ms to run. The second, third, and fourth times it took 0ms, showing that the query did not hit the database server but got its results directly from RAM. Long story short, variable scoping has no effect on query caching.
# Posted By Jose Galdamez | 9/25/09 12:47 PM
Jose Galdamez's Gravatar I ran a couple of tests based on your questions, and here's what I can tell you.

Cached queries are global for the server and have nothing to do with the variable they were originally stored in. When the original query variable is garbage collected, the query recordset is still cached in the server's RAM. You can test this with cffunction or within a CFM page itself. If you run cfquery within a CFM page and you cache the query, that query variable (in the variables scope by default) is garbage collected when the page is done running. Yet, the query is still cached in RAM. Same goes for var scopes and this scopes.
# Posted By Jose Galdamez | 9/25/09 12:48 PM
Jose Galdamez's Gravatar Ack... WTH

Each time I submitted my comment I got a Railo server error. Yet the comments still posted. Yuck! Sorry for trashing your blog post.
# Posted By Jose Galdamez | 9/25/09 12:50 PM
Eric Cobb's Gravatar Wow! Thanks for all of your work, Jose! You've really helped to clarify some of this for me!

I'll look into that Railo error and see if I can figure out what's going on.
# Posted By Eric Cobb | 9/25/09 1:40 PM
Pete Freitag's Gravatar Jose is correct, but I just wanted to point out one simple way you can test this to see for yourself...

Use a SQL Statement "SELECT getdate() AS d" as your cached query. This query should return an updated date each time you run it, if it is cached, it should return the same result until the cache expires.
# Posted By Pete Freitag | 9/25/09 1:48 PM
Eric Cobb's Gravatar Thanks Pete, that's a good little trick. I'll have to remember that one.
# Posted By Eric Cobb | 9/25/09 2:25 PM
Jose Galdamez's Gravatar That's a pretty slick test, Pete. I like it.

One thing I just learned is that if you have debugging output turned on, it will tell you if any queries executed were cached or not. You just look under the "SQL Queries" section. That way you don't have to guess. Checking for a 0ms execution time is how I checked the first time around. Don't know if that's a surefire filter for cached queries. My guess is no, but it worked for my experiment.

Lastly, you can set the max number of cached queries in your server admin by going to "Server Settings -> Caching". This one works sort of like FIFO (first in, first out). If you hit your max, the youngest cache in RAM gets flushed.

As with all things, there's a balance. The more you decide to cache, the more RAM you will ultimately need. So if you're going to use this heavily, just make sure the server is up to the task. 64-bit server installations are much better with RAM limitations than 32-bit ones. I forget what the figures are, but 32-bit CF servers only support something over 1 GB of RAM. Someone else will have to fill me in on what the upper bound for 64-bit is.

OK, I'll get off my soap box for now :)
# Posted By Jose Galdamez | 9/25/09 2:33 PM
Brad Wood's Gravatar As far as telling whether or not a query was pulled from cache-- you can also tell that by simply cfdumping the result set. There is a cached flag in there that tells you.

Regarding what "memory" the actual result set is stored in-- it is always only stored once. Every variable you "set" it into is simply a pointer to a single place in memory. When the varred variable is destroyed at the end of the method call, only the pointer is destroyed. The query remains along with any other pointers which reference it. This is true no matter how many "pointers" you make to a variable. This is also why we say complex objects like structs and queries are passed by reference since you are really just passing the pointer around and any changes you make are reflected in all the variables referencing it.

When there are no more variables referencing the query in memory it is orphaned and the next time your JVM runs garbage collection, it will be cleared out and the space reclaimed. Simple as that.

Also note, placing the query in the variables scope of a CFC only helps if the CFC itself is placed in a persistent scope and reused. If you are recreating the CFC every time you use it, the variables scope is being re-created every time. (Of course, using cachedwithin would still keep an internal reference to the cached result set in the innards of CF)

It's difficult to give hard advice without really knowing what you are doing Is the "CFC" a DAO? Often those follow the singleton pattern. (Create only once at application startup and stored in the application scope).
Watch out for concurrency problems if using shared scopes (like variables) in persisted CFCs. Your example above is fine, but the following would NOT be fine with out a cflock around it since it creates a race condition between the cfset and the cfquery.

<cffunction name="getEmp">
<cfset variables.Emp = "">
<cfquery name="variables.Emp" datasource="datasource" cachedwithin="#CreateTimeSpan(0, 6, 0, 0)#">
SELECT Firstname, Lastname
FROM Employee
WHERE ID = 123
</cfquery>
<cfreturn variables.Emp>
</cffunction>
# Posted By Brad Wood | 9/25/09 11:04 PM
Tony Nelson's Gravatar As for your question about calling the exact same query in another CFC or CFM page, according to Livedocs: "To use cached data, the current query must use the same SQL statement, data source, query name, user name, and password."

http://livedocs.adobe.com/coldfusion/8/htmldocs/he...
# Posted By Tony Nelson | 9/27/09 7:59 PM