Saturday, December 24, 2011

Neural Network Framework for Java

I spent the last month learning how to use Self-Organizing Maps (SOM) for my Neural Network course. I used a SOM to perform instance matching (which is not typically what it is used for) with the intention that possibly it could be used in a 2-level fashion or with some other supervised approach. Think of SOM as a clustering technique. It maps high dimensional data into a lower dimension (typically 2-D) space and it enables one to visually see the data in 2-D. The beauty of a SOM is that it is unsupervised which means you do not need to specify the desired output for your training set.

It outperformed K-Means when running comparison tests using the OAEI IIMB benchmark both in F-Measure scores and CPU time.


I used Encog for the Neural Net framework. I experienced a lot of memory issues but for the most part I was quickly running a SOM, K-Means and SVM comparison test.

For the next test, I think using Matlab, Weka or R might be a better approach. As much as I like to keep things nice and clean in the code. I quickly have memory issues as I increase the number of instances to test.

Code Sample for Encog SOM (based on the Encog example):

//Set of the training data, no desired output in this case
MLDataSet training = new BasicMLDataSet(trainingInput,null);

// Create the SOM neural network with an input count and an output count
// This is basically your input node size and output node size
// One point of improvement
SOM network = new SOM(inputNodeSIze,ouputNodeSize);

//reset the nework
network.reset();

//Here you specify your parameters i.e. learning rate and neighborhood function
//I used the NeighborhoodSingle here but clearly that is not the best choice
//Next round of tests will use a RBF and a GaussianFunction
//.7 for learning rate is not unreasonable
BasicTrainSOM train = new BasicTrainSOM(
network,
0.7,
training,
new NeighborhoodSingle());
//new NeighborhoodRBF(sizes, RBFEnum.Gaussian)

//store the winner in a space in the 2-d array reserved
//calling code will lump instances that have the same winner
//to determine which instances are 'similar'
double[][] newItems = new double[input.length][];
int i=0;
for (double[] item: input)
{
item[item.length-2]=network.winner(new BasicMLData(item));
newItems[i] =item;
i++;
}
Encog.getInstance().shutdown();