I spent the last month learning how to use Self-Organizing Maps (SOM) for my Neural Network course. I used a SOM to perform instance matching (which is not typically what it is used for) with the intention that possibly it could be used in a 2-level fashion or with some other supervised approach. Think of SOM as a clustering technique. It maps high dimensional data into a lower dimension (typically 2-D) space and it enables one to visually see the data in 2-D. The beauty of a SOM is that it is unsupervised which means you do not need to specify the desired output for your training set.
It outperformed K-Means when running comparison tests using the OAEI IIMB benchmark both in F-Measure scores and CPU time.
I used Encog for the Neural Net framework. I experienced a lot of memory issues but for the most part I was quickly running a SOM, K-Means and SVM comparison test.
For the next test, I think using Matlab, Weka or R might be a better approach. As much as I like to keep things nice and clean in the code. I quickly have memory issues as I increase the number of instances to test.
Code Sample for Encog SOM (based on the Encog example):
//Set of the training data, no desired output in this case
MLDataSet training = new BasicMLDataSet(trainingInput,null);
// Create the SOM neural network with an input count and an output count
// This is basically your input node size and output node size
// One point of improvement
SOM network = new SOM(inputNodeSIze,ouputNodeSize);
//reset the nework
network.reset();
//Here you specify your parameters i.e. learning rate and neighborhood function
//I used the NeighborhoodSingle here but clearly that is not the best choice
//Next round of tests will use a RBF and a GaussianFunction
//.7 for learning rate is not unreasonable
BasicTrainSOM train = new BasicTrainSOM(
network,
0.7,
training,
new NeighborhoodSingle());
//new NeighborhoodRBF(sizes, RBFEnum.Gaussian)
//store the winner in a space in the 2-d array reserved
//calling code will lump instances that have the same winner
//to determine which instances are 'similar'
double[][] newItems = new double[input.length][];
int i=0;
for (double[] item: input)
{
item[item.length-2]=network.winner(new BasicMLData(item));
newItems[i] =item;
i++;
}
Encog.getInstance().shutdown();
Saturday, December 24, 2011
Saturday, August 20, 2011
Paper Summary - Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference - Part 1
Toward Conditional Models of Identity Uncertainty
with Application to Proper Noun Coreference
A. McCallum and B. Wellner
This paper is interesting. They make the point that pairwise decisions may not always be independent of others. One may be able to resolve inconsistencies by using a dependence model. They mention work, Relational Probabilistic Model, which captures this dependence. However since it is a generative model, they state this could lead to complexities due to many features with varying degrees of granularity. They discuss Hidden Markov models and conditional random fields briefly and Relational Markov networks as a similar model but improved classification.
They then discuss their work specifically which is "three conditional undirected graphical
models for identity uncertainty" which make the coreference decisions. Their first model connects mentions, entity-assignments, and each attribute of the mention. Edges indicate dependence. There is the concept of a clique, parameters may be part of different cliques which results in patterns of parameters called clique templates. Parts of the graph that depend on a number of entities are removed and replaced with random variables indicating coreference (Read this paper again to make sure we are clear on this). Per-entity attribute nodes are removed and replaced with attributes of mention. They then use graph partitioning. There is a lot in this paper and really requires another read to understand their methods better.
with Application to Proper Noun Coreference
A. McCallum and B. Wellner
This paper is interesting. They make the point that pairwise decisions may not always be independent of others. One may be able to resolve inconsistencies by using a dependence model. They mention work, Relational Probabilistic Model, which captures this dependence. However since it is a generative model, they state this could lead to complexities due to many features with varying degrees of granularity. They discuss Hidden Markov models and conditional random fields briefly and Relational Markov networks as a similar model but improved classification.
They then discuss their work specifically which is "three conditional undirected graphical
models for identity uncertainty" which make the coreference decisions. Their first model connects mentions, entity-assignments, and each attribute of the mention. Edges indicate dependence. There is the concept of a clique, parameters may be part of different cliques which results in patterns of parameters called clique templates. Parts of the graph that depend on a number of entities are removed and replaced with random variables indicating coreference (Read this paper again to make sure we are clear on this). Per-entity attribute nodes are removed and replaced with attributes of mention. They then use graph partitioning. There is a lot in this paper and really requires another read to understand their methods better.
Paper Summary - Disambiguation and Filter Methods in Using Web Knowledge for Coreference Resolution
Disambiguation and Filter Methods in Using Web Knowledge for Coreference Resolution
O. Uryupina and M. Poesio
They describe how they use Wikipedia and Yago to increase their Coreference Resolution performance. They use BART to support their efforts. Their classification consists of a anaphor and a potential antecedent, as they describe. Using their associated feature vectors, they use a 'maximum entropy classifier' to determine coreference. They used Wikipedia to improve their aliasing algorithm, which would perform string matching functions. They use Wikipedia based information as a feature, and to disambiguate mentions. They use Yago to supplement their efforts when there are too few features to make any reasonable decisions related to coreference. Yago information is also incorporated as a feature. They tested using ACE with reasonable scores.
Using these publicly available knowledge bases appears to improve performance (2-3% in this case). Something to think about....
O. Uryupina and M. Poesio
They describe how they use Wikipedia and Yago to increase their Coreference Resolution performance. They use BART to support their efforts. Their classification consists of a anaphor and a potential antecedent, as they describe. Using their associated feature vectors, they use a 'maximum entropy classifier' to determine coreference. They used Wikipedia to improve their aliasing algorithm, which would perform string matching functions. They use Wikipedia based information as a feature, and to disambiguate mentions. They use Yago to supplement their efforts when there are too few features to make any reasonable decisions related to coreference. Yago information is also incorporated as a feature. They tested using ACE with reasonable scores.
Using these publicly available knowledge bases appears to improve performance (2-3% in this case). Something to think about....
Monday, August 8, 2011
Stanford Online AI Course
This course is offered for Fall 2011 semester and the instructors are Sebastian Thrun and Peter Norvig. It should be a good class. The formal title is "Introduction to Artificial Intelligence".
Join
Join
Wednesday, July 20, 2011
Boom Time In Silicon Valley?
In case you missed this over the weekend, apparently in Silicon Valley the next big boom is occurring, at least according to the LA Times.
For those keeping their eye on the market for that special time to start-up, this may be good news. Still too early to know for sure.
For those keeping their eye on the market for that special time to start-up, this may be good news. Still too early to know for sure.
Friday, July 15, 2011
Intuition
There was an interesting question posed on the Linked In AGI group discussion board:
Has anyone put out a suggestion on how to impliment an "Intuition" engine?
Responses were also quite interesting...
Has anyone successfully built an intuition engine? The responses ranged from a weak description of how one might do this to long detailed descriptions of current research that has been ongoing for many years.
If this sounds interesting you may want to join the Linked In group AGI — Artificial General Intelligence.
Or go to the AGI Conference in August .
Has anyone put out a suggestion on how to impliment an "Intuition" engine?
Responses were also quite interesting...
Has anyone successfully built an intuition engine? The responses ranged from a weak description of how one might do this to long detailed descriptions of current research that has been ongoing for many years.
If this sounds interesting you may want to join the Linked In group AGI — Artificial General Intelligence.
Or go to the AGI Conference in August .
Linked Data Paper Summary
"Linked Data - The Story So Far", C. Bizer, T. Heath, T. Berners-Lee, 2009, http://eprints.ecs.soton.ac.uk/21285/1/bizer-heath-berners-lee-ijswis-linked-data.pdf
My work will support working with linked data so I am attempting to build a better understanding of this topic.
This paper is good for providing a very basic understanding of linked data. If you are new to the concept of linked data, this is a good starting paper to read. It provides basic principles, examples, and isn't too technical.
If you already have a good understanding of the basics, skip this paper. I read it in about 10 minutes and it was pretty much just a review of what I already knew.
My work will support working with linked data so I am attempting to build a better understanding of this topic.
This paper is good for providing a very basic understanding of linked data. If you are new to the concept of linked data, this is a good starting paper to read. It provides basic principles, examples, and isn't too technical.
If you already have a good understanding of the basics, skip this paper. I read it in about 10 minutes and it was pretty much just a review of what I already knew.
Labels:
Linked Data,
Research Paper Summaries,
Semantic Web
Hadoop Meet-up - Large Scale Graph Processing On HBase and Map/Reduce on Greenplum
I attended the Hadoop Meet-up on Tuesday titled "Large Scale Graph Processing On HBase and Map/Reduce on Greenplum". You can view the event here.
I attended this event for two reasons: It has been a couple of years since I worked intimately with Hadoop and I wanted to see how others are using it. I was also hoping the discussion on Large Scale Graph Processing would be useful for my research.
Though the presentations were somewhat stimulating, I didn't find much I could use for my work.
I attended this event for two reasons: It has been a couple of years since I worked intimately with Hadoop and I wanted to see how others are using it. I was also hoping the discussion on Large Scale Graph Processing would be useful for my research.
Though the presentations were somewhat stimulating, I didn't find much I could use for my work.
Tuesday, June 21, 2011
Canopy Clustering
"Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching", McCallum,Nigam,Ungar,http://www.kamalnigam.com/papers/canopy-kdd00.pdf
This paper discusses a different type of clustering, titled canopy clustering. It is an interesting idea. There are basically two thresholds, using a 'cheap distance metric', we evaluate a list of points. Threshold 1 is > than Threshold 2. Pick one point to compare with all the other points in the list. When the distance between the two points falls within T1 put the points into a canopy. If the distance falls within T2 then remove point from list. We generate the canopies this way and work through the list until empty.
We can then apply our second level of clustering to each canopy and are pretty much guaranteed that if two points do not fall into the same canopy then they are likely not to be co-referent and therefore do not need to be evaluated.
This is efficient and elegant. Currently the only implementation that I found of canopy clustering is in Mahout. I am building my own implementation though to get a feel for how well it works.
This paper discusses a different type of clustering, titled canopy clustering. It is an interesting idea. There are basically two thresholds, using a 'cheap distance metric', we evaluate a list of points. Threshold 1 is > than Threshold 2. Pick one point to compare with all the other points in the list. When the distance between the two points falls within T1 put the points into a canopy. If the distance falls within T2 then remove point from list. We generate the canopies this way and work through the list until empty.
We can then apply our second level of clustering to each canopy and are pretty much guaranteed that if two points do not fall into the same canopy then they are likely not to be co-referent and therefore do not need to be evaluated.
This is efficient and elegant. Currently the only implementation that I found of canopy clustering is in Mahout. I am building my own implementation though to get a feel for how well it works.
SimHash: Hash-based Similarity Detection
"SimHash: Hash-based Similarity Detection", Sadowski, Levin, 2007.
This paper outlines a hash algorithm that can be used for similarity detection. Most hash algorithm are designed to offer low collision and hash values for similar strings can vary quite a bit. This hash basically sets out to achieve the opposite, higher collision and hash keys for similar strings are similar if not the same.
If you cannot use term frequency and need a numeric representation of a string for statistical processing, what process can be used? Using an integer-based hash is one way to achieve this, though in my opinion it is not the most sophisticated of approaches.
An implementation of this algorithm showed that it is a reasonable approach for hashing strings with the intent to determine similarity. I found minor issues which I will address by altering the algorithm.
Results showed that this is a reasonable approach....
This paper outlines a hash algorithm that can be used for similarity detection. Most hash algorithm are designed to offer low collision and hash values for similar strings can vary quite a bit. This hash basically sets out to achieve the opposite, higher collision and hash keys for similar strings are similar if not the same.
If you cannot use term frequency and need a numeric representation of a string for statistical processing, what process can be used? Using an integer-based hash is one way to achieve this, though in my opinion it is not the most sophisticated of approaches.
An implementation of this algorithm showed that it is a reasonable approach for hashing strings with the intent to determine similarity. I found minor issues which I will address by altering the algorithm.
Results showed that this is a reasonable approach....
Labels:
Classification,
Hashing,
Similarity Detection
Monday, May 16, 2011
Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution
"Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution"
Journal of Artificial Intelligence Research 40 (2011) 469–521 Submitted 06/10; published 02/11
Altaf Rahman altaf@hlt.utdallas.edu
Vincent Ng vince@hlt.utdallas.edu
Human Language Technology Research Institute
University of Texas at Dallas
800 West Campbell Road; Mail Station EC31
Richardson, TX 75080-3021 U.S.A.
http://www.jair.org/media/3120/live-3120-5478-jair.pdf
-Long paper, very thorough
-A lot of history, use the citations in this paper
-Learn more about "centering algorithms"
-Describes three different models, mention-pair,entity-mention and mention-ranking
-Outlines key features and deficiencies of each
-In particular the transitivity property is not addressed in the mention-pair model so clustering is used
-Mention-ranking outperforms mention-pair
-Describe a cluster-ranking approach combines both models
-Use lexicalization and knowledge of anaphoricity
-Used ACE for experiments
Interesting:
Journal of Artificial Intelligence Research 40 (2011) 469–521 Submitted 06/10; published 02/11
Altaf Rahman altaf@hlt.utdallas.edu
Vincent Ng vince@hlt.utdallas.edu
Human Language Technology Research Institute
University of Texas at Dallas
800 West Campbell Road; Mail Station EC31
Richardson, TX 75080-3021 U.S.A.
http://www.jair.org/media/3120/live-3120-5478-jair.pdf
-Long paper, very thorough
-A lot of history, use the citations in this paper
-Learn more about "centering algorithms"
-Describes three different models, mention-pair,entity-mention and mention-ranking
-Outlines key features and deficiencies of each
-In particular the transitivity property is not addressed in the mention-pair model so clustering is used
-Mention-ranking outperforms mention-pair
-Describe a cluster-ranking approach combines both models
-Use lexicalization and knowledge of anaphoricity
-Used ACE for experiments
Interesting:
"Specifically, a classifier that is trained on
coreference-annotated data is used to determine whether a pair of mentions is co-referring
or not. However, the pairwise classifications produced by this classifier (which is now commonly
known as the mention-pair model) may not satisfy the transitivity property inherent
in the coreference relation, since it is possible for the model to classify (A,B) as coreferent,
(B,C) as coreferent, and (A,C) as not coreferent. As a result, a separate clustering mechanism
is needed to coordinate the possibly contradictory pairwise classification decisions and
construct a partition of the given mentions."
Read about Lappin and Leass’s algorithm
Read about centering algorithms
"the distinction between
classification and ranking applies to discriminative models but not generative models.
Generative models try to capture the true conditional probability of some event. In the context
of coreference resolution, this will be the probability of a mention having a particular
antecedent or of it referring to a particular entity (i.e., preceding cluster). Since these probabilities
have to normalize, this is similar to a ranking objective: the system is trying to raise
the probability that a mention refers to the correct antecedent or entity at the expense of
the probabilities that it refers to any other. Thus, the antecedent version of the generative
coreference model as proposed by Ge et al. (1998) resembles the mention-ranking model,
while the entity version as proposed by Haghighi and Klein (2010) is similar in spirit to the
cluster-ranking model."
Labels:
Clustering,
Coreference Resolution,
Machine Learning
Subscribe to:
Posts (Atom)