Wednesday, July 12, 2006

It works!

After many sad hours of debugging my persistence model, I now have a working n-dimensional spatial cache! Finally, results can now be found. I need to put together a performance testing suite, and quickly. Here are a few of the variables I can define:

No Cache vs. Active Write vs. Lazy Write
For any test, it should be run three ways: without a cache at all, with active writing (wait for database to store information before returning) and lazy writing (create a new thread to save data to database).
Hit Percentage
Ranging from 0 to 100%, this determines how much of the request was already cached. Another sub-variable would be how many intersecting regions there are, effecting the number of resulting calls.
Block Size
One optimization is to expand requests to a certain unit size, definable through the cacheable annotation "scaleFactors" option. This increases initial time, but also increases the amount of intersection and gives the partitioning algorithm a better sample.
Dimension
Simply, how many spatial relationships are there in the data? With my model, I can go up to 10, however the volume grows exponentially with the number of dimensions, so going higher may not be worth it.
Request Size
How do the results change with small versus large requests? What is the optimal request size, possibly dependent on earlier variables.
Calculation Difficulty
Since I'm finding primes using the Chinese Remainder Theorem, I can simply add a multiple to my space to work with larger and larger numbers, slowing the calculations. Here, I can test how difficult a calculation must be in order to have the cache be feasible.
Partitioning Algorithm
Right now, I have only the one partitioning algorithm implemented. Depending on the results of these tests, more may be implemented and used to see how the results are changed. The different implementations may need to wait until later, however.

Along with the different variables, I need to see different results for each test. These may contradict each other, or have different optimal points.

Response Time
This is the most important right now. Faster responses are the reason caches exist, otherwise we wouldn't store calculated information. It may be important to track average response time over multiple requests to track how quickly this becomes useful.
Memory Usage
A lot of data structures are used to manage this information. An effort has been made to limit the amount of data needed in memory at a time, but it must be checked to stay low. JProfiler will be used to track Heap Telemetry.
Database Usage
How big does the database get, and how fast does it get there? Space limitations are important.
Fragmentation
How many times is the method being called every time? What are the low/average/high and deviations of these numbers? This should be plotted against the hit percentage and number of intersecting regions.

Putting this together will take the rest of the week just with planning. Next week will be implementation and reporting. Hopefully I can do small optimizations quickly as tests are first introduced.

0 Comments:

Post a Comment

<< Home