Tuesday, July 18, 2006

Creating Test Suites

My current task is creating a plan for the performance testing. One thing I know I'll need is a way to generate test cases. My plan is to make servlets that create certain types of test cases dependent on a certain set of parameters.

Right now, I'm making the test case for creating a set of regions (of a defined number) that intersect a region with a certain hit percentage. This will be used for tests that generate bar graphs comparing the three variations for Response Time vs. Hit Percentage over several volumes and numbers of intersecting regions.

To accomplish this, I'm making a quick class that attempts to evenly space some points around the region and then iterate their growth to make the total hit percentage as close to the request as possible. This should be finished today.

Then, the servlet will take the parameters, run it through this process, and save it to a file on the server, so the JMeter tests can repeatedly request the exact test case for the different types.

A similar approach will be taken for the performance over time and load testing cases. The test case will be randomly generated and saved, and then that exact test case will be repeated over each type of cache.

It will be interesting to see how many results I can gather, and then weed them down for a publication. I will probably use a lot of them in my thesis. The more data, the better, it seems from other theses I've seen.


Friday, July 14, 2006

Singleton

I went ahead and made the EntityManagerFactory for the LazyWriter a singleton.  This caused me to pull the LazyWriter out of the CacheManagerBean and into a top-level class.  That is probably a good move, anyway.  Now, the hassle of loading the properties only needs to happen once per run of the server.

Sometimes the Cutting Edge Hurts

I got lazy writing to work today.  The reason I didn't get it to work before was that I got frustrated with the meaningless error messages that I got while working with Hibernate.  It turns out that I was getting an error because an exception that was trying to be thrown couldn't be found.  Hibernate didn't include a necessary library with their deployment.  So, I got the latest stable version of ehcache.jar from the EHCache project and put it in my classpath.  However, when it tried to instantiate the exception, it found that it did not have access, so it threw another error.  It turns out that I needed version 1.2.1.RC1 instead of 1.2.0 (which was the more stable one).  After this I discovered that I was trying to instantiate an EntityManagerFactory that was reliant on JTA, which requires JEE, while in JSE mode.  I had to create a Map to change the properties so I could create the proper entity management objects.

I'm not sure if this lazy writing is any good, since it create a new factory every time.  Perhaps I'll create it as a singleton, but I'll profile it first to see how long it takes.


Some First Results

Here is a screenshot of some performance testing on my cache. This is using active database writes, over 50 requests randomly selected from 5 different, intersecting regions. I am using a high multiple and high request volume, but not even close to the timing standards of a NADSS map.

Wednesday, July 12, 2006

So I Celebrated Early...

I guess I forgot to test lazy writes.  It seems that my EntityManager loses it's persistence context when my Statefule Session Bean is removed partially through the process.  Perhaps calling it the unmanaged way could do the trick.

It works!

After many sad hours of debugging my persistence model, I now have a working n-dimensional spatial cache! Finally, results can now be found. I need to put together a performance testing suite, and quickly. Here are a few of the variables I can define:

No Cache vs. Active Write vs. Lazy Write
For any test, it should be run three ways: without a cache at all, with active writing (wait for database to store information before returning) and lazy writing (create a new thread to save data to database).
Hit Percentage
Ranging from 0 to 100%, this determines how much of the request was already cached. Another sub-variable would be how many intersecting regions there are, effecting the number of resulting calls.
Block Size
One optimization is to expand requests to a certain unit size, definable through the cacheable annotation "scaleFactors" option. This increases initial time, but also increases the amount of intersection and gives the partitioning algorithm a better sample.
Dimension
Simply, how many spatial relationships are there in the data? With my model, I can go up to 10, however the volume grows exponentially with the number of dimensions, so going higher may not be worth it.
Request Size
How do the results change with small versus large requests? What is the optimal request size, possibly dependent on earlier variables.
Calculation Difficulty
Since I'm finding primes using the Chinese Remainder Theorem, I can simply add a multiple to my space to work with larger and larger numbers, slowing the calculations. Here, I can test how difficult a calculation must be in order to have the cache be feasible.
Partitioning Algorithm
Right now, I have only the one partitioning algorithm implemented. Depending on the results of these tests, more may be implemented and used to see how the results are changed. The different implementations may need to wait until later, however.

Along with the different variables, I need to see different results for each test. These may contradict each other, or have different optimal points.

Response Time
This is the most important right now. Faster responses are the reason caches exist, otherwise we wouldn't store calculated information. It may be important to track average response time over multiple requests to track how quickly this becomes useful.
Memory Usage
A lot of data structures are used to manage this information. An effort has been made to limit the amount of data needed in memory at a time, but it must be checked to stay low. JProfiler will be used to track Heap Telemetry.
Database Usage
How big does the database get, and how fast does it get there? Space limitations are important.
Fragmentation
How many times is the method being called every time? What are the low/average/high and deviations of these numbers? This should be plotted against the hit percentage and number of intersecting regions.

Putting this together will take the rest of the week just with planning. Next week will be implementation and reporting. Hopefully I can do small optimizations quickly as tests are first introduced.

Wednesday, July 05, 2006

A Different Direction

I spent some time today thinking of a different way to approach the partitioning problem.  Instead of finding a minimal chord sequence (which gets really complicated and involves non-convex pivots and chords) I should think of maximal convex regions.  Expand regions in every direction from every unit block of the null space.  This is similar to the approach I had last summer, but instead of greedily picking them as I make them, create all of them and create a graph that weights their intersections by the number of regions created when one is chosen before the other.  This is a horrible explaination, but it's just for my benefit so I don't forget it.  What I need in the end is:

A topologically sorted, minimal-weight induced subgraph from a set of maximal regions that cover the unit blocks of the null space.

I think.  That should do it.  Of course, minimal-weight is a combination of the number of regions and the sum of the weights of the directed edges between them (that don't create a cycle!). The problem with this so far is the way regions split each other.  Two peices may split another into two based on each individually, but after both have split it, it could be either two or three peices based on the arrangement of the other two.

This may run me into a very complex wall like the chord method, but at least this will be easier to implement various heuristic algorithms because I only deal with convex regions instead of complex polytopes.