Thursday, June 29, 2006

Past the Biggest Bugs

Now that I have fixed all current crashes, it is time to nit pick and find things wrong with my implementation.  I need to set up a programmatic test suite, runnable by servlet, that tests the following things: 

Caching completed data. 

Loading cached information. 

Combining data items after computing and caching.

There will be more on the way, but I need to get this suite set up and moving.  It will be key to have this working when I start performance testing.  Performance testing is scheduled for the week of July 10-14.  Hopefully, most analysis can be done during that week and I can crunch down on writing the paper component, including a definition of the mathematical model which will help for secondary performance testing.

Past the Biggest Bugs

Now that I have fixed all current crashes, it is time to nit pick and find things wrong with my implementation.  I need to set up a programmatic test suite, runnable by servlet, that tests the following things: 

Caching completed data. 

Loading cached information. 

Combining data items after computing and caching.

There will be more on the way, but I need to get this suite set up and moving.  It will be key to have this working when I start performance testing.  Performance testing is scheduled for the week of July 10-14.  Hopefully, most analysis can be done during that week and I can crunch down on writing the paper component, including a definition of the mathematical model which will help for secondary performance testing.

And Now to More Important Issues

This morning, I fixed all of the immediate errors with persistence.  Apparently, using Object as a BLOB column generates a ClassCastException if you don't have exactly and Object there, but if you use Serializable as your member type then any serializable object can be put in there.

Now my biggest issue is figuring out how to call the cacheable method multiple times.  I was thinking that I could do it with the Method.invoke(Object target, Object[] params) method, but I get an IllegalArgumentException when I use either context.getTarget(), context.getTarget().getClass().newInstance(), or method.getDeclaringClass().newInstance().  I will investigate this more this afternoon.


Wednesday, June 28, 2006

It's in the Details

Sometimes I can really ignore the simplest of details when in a hurry.  I wasn't paying attention to which properties I added to my persistence.xml file and was using the wrong dialect for Hibernate and I needed just "update" as the auto property.  That's why my tables were not being created.  After swapping the values with the proper ones <em>easily seen in another file</em> it worked properly.  Well, Hibernate could do what it needed to do, but I had claimed that my RTreeEntity's id was auto-generated, even though it was actually a String.

Some Notes on Debugging

I have the web project up and linked with the EAR and I'm tracking problems with my implementation. Here are a few things that have gone wrong so far. Some are fixed.
Fixed: Cached annotation did not exist at runtime.
Apparently, annotations only exist at compile time by default. An extra annotation must be added to the @interface definition: @Retention(RetentionPolicy.RUNTIME).
Fixed: Connecting to EJB must be defined explicitly in unmanaged objects
Since I didn't make my CacheInterceptor an EJB, using @EJB did not instantiate my CacheManager EJB, so I use an InitialContext to load it. It may be possible to use an @Inject tag and not require the InitialContext.
Problem: Invoking a method more than once
In order to compute the uncached information, it must be split into a set of valid requests, which then must be called on the target object. So far using the Method.invoke(Object target, Object[] params) doesn't like using the target of the InvocationContext and instantiating a new EJB is does not work, either. This is placed on the back burner while I try to work with the persistence problems.
Problem: Persistence is not creating tables
I rewrote my RTree to be managed by Hibernate, but so far it is not creating the tables. I get it to properly recognize that the entities exists on deployment, but the SQL queries fail when trying to use tables that haven't yet been created.
I hope to fix most of these problems today or tomorrow morning so I can focus on the algorithm implementation next week.

Tuesday, June 27, 2006

Web Project Built

I got a web project built today.  I used my laptop with the Eclipse Web Tools plugin (doesn't work on intel macs until Friday) to create the project directories and an initial web.xml.  From that, I copied it into my Eclipse 3.2 framework and customized the build.xml plugin from Orangepeel.  I have a loose grasp of the directory structure now, and what all the xml configuration files do.  Now, I just need to attempt loading my EJBs and see if the interceptors and persistence things work.

Monday, June 26, 2006

Entity Tests Pass

The RTreeEntity structure now passes all POJO test cases.  Now, I need to set up the persisting and proplerly create, persist, and remove the data.  This will be a challenge as well.  I may need to recursively persist the nodes.  Also, I'll need to synchronize the tree between deletes and inserts, because that could cause all sorts of problems with the tree.  Cached data may not always be properly cached, and instead lost.

Persisting Trees

I'm not sure if it is a good idea to jump into spatial query trees in my first attempt at using persistence, but that's what I'm working on.  I need to finish rewriting my R-Tree, which wouldn't be that hard if I didn't need to use Collections instead of arrays.  Now, the conversion requires more thought.  After I test it in POJO form, I'll need to find a way to test it as it persists.  The biggest issues with my tree structure involve the relationships between nodes and the tree itself.  The nodes have children and parents, so are those relationships paired by the persistence or monitored seperately?  I'm going to continue with the belief that they are separate unless I specify it, and I won't have to change my code significantly.

I still have to figure out how to test the EJB calls.  Possibly through a web system that programmatically calls a test suite and then reports the results.  That would quickly allow the tests to be run within the container.  Otherwise, I need to figure out how to connect to the container from a normal Java program and run my tests that way.


Wednesday, June 21, 2006

The Solution?

I've developed an interface to help me with this dilemma I have. The immediate benefit is that I never have to rewrite a method header, and I never have to rewrite the cache. However, there will need to be a new object written that implements the CacheHelper interface for almost every new method (unless they become very similar). Here are the three methods that are required:

public IRegion getRegionFromParameters(Object[] params)
Take the given parameters and convert them into a request that represents the spatial values of the cache. Any parameters that are not represented are assumed to not change the resulting data at all. Now I can get a valid region without worrying about specialized objects and arrays.
public Object[] getParametersForRegion(IRegion region, Object[] params)
Generate a new list of parameters from a new region and the old parameter set. This allows the new regions to be translated into valid parameters no matter what data types they are.
public Object combine(Object[] params, Object[] results)
From a list of results that were computed and loaded from the cache, create a single result object that includes only values that would be expected from the given parameters.

A Question of Parameters

I've thought about how to handle the dimension parameters in detail, but haven't quite decided.  There are several concerns that I have for this crucial section of my research. 

First, I need to be able to get numerical limits on the spatial query.  I need a low and a high point for each defined dimension.  If the method signature contains a list of integers or floats listed "lowX, highX, lowY, highY,..." then I have no problems with this.  However, if the values are hidden inside of a query object of some sort (like a calendar or date), I need to access the methods within those types.  Requiring an annotation with a low and high parameter list that are arrays of Strings can easily work for the primitive types.  I could even implement a way of chaining through method calls, so if I had a parameter called "range" of type DateRange, I could list my parameters as "range.getStartDate().getTimeMillis()" and "range.getEndDate().getTimeMillis()" resulting in primitives at the end of the calculation.  After browsing through a bit of the reflection API, I could even allow for array indices if necessary.

A problem with this approach becomes visible with my second need: I need to rebuild these parameters from limits I define after a partial cache hit.  The new parameters will be passed to new instances of the method to fill the uncached space.  If I pull the milliseconds from the start and end dates of a DateRange object, how am I supposed to rebuild a new DateRange?

There may need to be a distinction between my proof-of-concept cache, where I know absolutely nothing about what I'm caching, and the implemented version that must work with an existing API.  I didn't want to have a difference between them, but I may just need to bite the bullet with this.


Tuesday, June 20, 2006

Bug Fixed with the Power of Testing

I seem to have fixed the deletion bug.  It passes my random test for dimensions 1,2,3,4,5, and 10 after several executions.  It used to fail every run, so hopefully it isn't just hiding.  I made my tests using TestNG, a relatively new framework for testing in Java.

RTree Delays

My RTree implementation has a bug during deletion.  When I delete a node, it sometimes doesn't adjust the above nodes properly, so searching for nodes later causes a problem because the parent nodes no longer contain their children.

Thursday, June 15, 2006

Finish the Polytope

Now that I've created the interfaces for the polytopes, refactored the RTree to work with them, and built a few objects (Point, Region, Polytope, ComplexPolytope) to implement them, I need to finish those implementations. Here is a possible order for the complex things that need to happen:
Intersection
Generate a list of connected polytopes where each is a subset of the originals.
Union
Generate a list of connected polytopes that compose the entire space of the originals. This will be either 1 or 2 polytopes, depending on if they intersect or not.
Subtraction
Generate a list of connected regions that compose the space of this polytope outside of the other. In the case of the complex polytope, this also includes the intersection of the holes.
List the concave pivots
Concave pivots are points where decisions are made for partitioning. Find a list of them.
List the chords
Chords pass between pivots to actually partition the space. Each concave pivot is adjacent to two of them, but each chord can have 1 to many pivots.
Create the Pivot-Chord graph
The Pivot-Chord graph is the bipartite graph between pivots and chords, listing adjacencies. Each pivot has degree 2.
Create the Strong Intersection Graph
The vertices are chords. Edges are directional so chord x strongly intersects y iff (x,y) is in E.
Create the Weak Intersection Graph
Same as the strong intersection graph, except not directional, and edges only exist if both directions existed.
I think that's enough planning for now. That should keep me busy for today and tomorrow.

Some Figures

Here are some figures I've made so far for my thesis.  I'm not expecting them to be very good or in final form, but they are some important examples that need to be shown somehow.

Wednesday, June 14, 2006

Graphs and Polytopes

I've begun Polytope development. The general idea is that the k-dimensional polytope is bound by a set of polytopes in (k-1) dimensions, each with 1 fixed dimension. This recursive definition has a base case: lines and points. Line segments will always be valid regions, so they can work AND they are connected by points. It is very easy to build a Polytope isomorphic to a region, because the boundaries can be counted by 2ikCi (k choose i) where i is the number of fixed dimensions. The challenge lies in splitting that region into smaller components as regions are subtracted from it. Also, the intersection of a coordinate hyperplane with the boundary must be found in order to facilitate chord partitioning.

The solution is to have a graph with the (k-1)-dimensional polytopes as vertices and the edges existing between adjacent boundaries. This way, when an intersection is found, the intersecting polytopes can become the boundaries, and the graph can be used to check connectedness. To help with the intersection, the polytopes will be stored in an R-tree.

Tuesday, June 13, 2006

Since I'm Thinking About It...

While I'm on the subject, I might as well add methods to build the pivot and chord lists, the pivot-chord graph, and the weak intersection graph. These are necessary for any algorithms I have, so they will have to exist in some form.
public IPolytope[] getPivots()
Pivots are connected subsets of the border with 2 fixed dimensions, a locally isomorphic 2D slice on those dimensions of which there is a 270 degree interior angle. If these exist, we don't have a convex region.
public IPolytope[] getChords()
Chords are connected subsets of the interior with 1 fixed dimension. The only interesting ones are those that are adjacent to at least one concave pivot. These form borders of interior regions and will split the polytope during chord partitioning.
public BipartiteGraph<IPolytope,IPolytope> getPivotChordGraph()
The pivot-chord graph is a bipartite graph G = (X ∪ Y, E) where X is the set of pivots, Y is the set of chords, and E is the set of relations xy where pivot x is adjacent to chord y.
public Graph<IPolytope> getWeakIntersectionGraph()
The weak intersection graph is the graph G = (V,E) where the vertex set V is the set of chords and the edges pair two chords that strongly intersect each other. A chord x strongly intersects a chord y if x divides any of the pivots adjacent to y or disconnects the y such that two pivots are no longer in the same connected part of y. If the strong intersection isn't bidirectional, the edge is not in the weak intersection graph.

Polytope Operations

There are a few useful operations when dealing with Polytopes. Most of these are similar with normal polygons, just much harder. The idea is to have an interface IPolytope and IRegion. The IRegion will substitute what I am currently using for Region-specific methods, like getting the lower and upper bounds. In fact, the Region object will implement both. Here is a list of possible IPolytope methods:
public int getDimensions()
Gets the number of dimensions. All polytopes must exist in the same number of dimensions in order to work at all
public IRegion getBounds()
The bounds are necessary for integration with RTrees. the RTree will have to be refactored for this, too.
public boolean intersects(IPolytope p)
Is there any volume within both regions? If the resulting intersection has any fixed dimensions, intersection is still true.
public boolean contains(IPolytope p)
Is p completely within this polytope? As long as no volume within p is outside of this, contains is true.
public boolean windows(IPolytope p)
Does p intersect the border of this polygon? Does it intersect but not strictly contain?
public IPolytope[] intersection(IPolytope p)
Generates an array of connected polytopes that result from intersecting this and p. If no intersection exists, returns empty array.
public IPolytope[] subtraction(IPolytope p)
Generates an array of connected polytopes that result from subtracting p from this. If p contains this, then the subtraction is null and returns an empty array.
public IPolytope[] union(IPolytope p)
Generates a list of connected polytopes resulting from the union of this and p.

More to come later. This is just a first list.

FIRM is built

I have successfully built and deployed FIRM on my work machine. This afternoon I will get it running at home so I can work there, as well. Perhaps I will take a break from theory and set up a simple interceptor and client to test it. I need to find the way interceptors are done now, seeing how all the online examples use descriptors still, but there is a way to do it with annotations. Also, I need to check if the current data merging capabilities are enough for me to build my annotation schema.

The Good, the Bad, and the Ugly

Good News! I found that iterating over cached regions and selecting their pivots is much much much faster than the alternative of iterating over all grid points.

Bad News: With the introduction of non-convex pivots, I now need to find how to define orthogonal regions in a much better way.

Ugly News: Polygons are mildly difficult. Polytopes are much much harder. Not only do I need the vertices, but which vertices are connected by which lines, and which lines bind which planes, and which planes bind which spaces, and which spaces bind... etc. I was trying to avoid this, but I don't think I can anymore.

Pivotal Progress

Yesterday, I finally made progress on the pivot detection algorithm. I did the brute-force, memory-intensive, and SLOW way of doing it, so it will need to be replaced, but at least I can start building pivot-chord graphs and trying my approximation algorithms. I'll have to time what activities are taking so long. It might also be that I am simply having issues with such increased volume that the algorithms can't keep up with the input size, let alone the horrible complexity. I've also rebuilt my 2D-slice viewer so I can see what these pivots look like with the random boxes I use.

Theoretically, I have two major problems. The first big one is that my chords are not always convex themselves for dimensions higher than 3. I don't know if the chords will automatically fix this or not. We'll have to see when I start to implement a few algorithms. If only 4 dimensions were easier to visualize. The worst-case scenario will be if I have to perform partitioning on all pivots of lower dimension recursively and find all possible partitions (possibly not minimal) of the pivots and the combinations thereof.

The second problem is my definition of bands. Bands are paths within the interior that are wrapped around some of the exterior and can't be pulled away without cutting the path. These normally do exactly what I expect them to do. However, I've discovered deceptive bands (bands that appear during partitioning) and pseudo-bands (bands that can be cut twice by the same chord). These both cause my formulas to change. Deceptive bands decrease the region count, but simultaneously increases the chord count (i think, has not yet been proven). Pseudo-bands are hard to detect and increase the number of regions by lowering the number of bands that actually cause problems. Hopefully I can find some redefinition of these things in order to solidify a counting mechanism. Of course, these issues have only been visualized in three dimensions, there may be even worse examples in more dimensions.

Thursday, June 08, 2006

Devel Hell Begins

This blog is simply something for me to use as a productivity tracker during my summer research. If you found it, great. I wasn't really going to make it public, but who cares?