Sunday, April 25, 2010

Paper Summary - A Framework for Combining Ontology and Schema Matchers with Dempster-Shafer

"Paper Summary - A Framework for Combining Ontology and Schema Matchers with Dempster-Shafer", P. Besana


This is a short paper about using Dempster-Shafer for ontology mapping. They have a tool based on their work: PyontoMap. This paper is really relevant so the summary will be deferred.

Paper Summary - BeliefOWL: An Evidential Representation in OWL Ontology

Amira Essaid and Boutheina Ben Yaghlane, BeliefOWL: An Evidential Representation in OWL Ontology, pages 77-80, International Semantic Web Conference, International Workshop on Uncertainty Reasoning for the Semantic Web, Washington DC, USA, 2009.


This paper is very short, only 4 pages. It starts with a discussion of how uncertainty is represented currently in ontologies using either probabilistic or fuzzy approaches. It then proposed the Dempster-Shafer approach as another option. They discuss the work by :
Ben Yaghlane,B.: Uncertainty representation and reasoning in directed evidential
networks, PhD thesis, Institut Sup´erieur de Gestion de Tunis Tunisia, 2002.

which is work related to representing uncertainty using a DAG. I haven't read this paper yet but it certainly seems like a good read.

This paper then goes on with a presentation of BeliefOWL, their uncertainty extension to OWL.


They define two classes to represent prior evidence:


- enumerates different masses and has object property which specifies the relation between itself and .


- expreses prior evidence and has property



They define two classes which represent conditional evidence:


- has an object property of


- conditional evidence with property



They construct an evidential network by translating the OWL ontology into a DAG. They then assign masses to nodes in the DAG. Details of this work are mentioned but briefly.

Overall I don't know if this helps my work in anyway except to see some uses of DS with ontologies.

Paper Summary - Uncertainty in Ontologies: Dempster-Shafer Theory for Data Fusion Applications

"Uncertainty in Ontologies: Dempster-Shafer Theory for Data Fusion Applications", A.
Bellenger1 and S. Gatepaille, Defence and Security Information Processing, Control and Cognition department, France

This paper is relevant to my work because they use DS to represent uncertainty in ontologies. They do this by creating an upper ontology that contains the DS measures calculated i.e mass, belief, plausibility, etc.


The paper starts with a background in data fusion and gives some examples of how uncertainty is captured in ontologies and why it is important to represent uncertainty. They define uncertainty as "incomplete knowledge, including incompleteness, vagueness, ambiguity, and others". In addition to the natural occurrence of uncertainty in data, it is also a product of fusing data which may be acquired from different sources.

This is an interesting statement: " If the user/application is not able to decide in favor of a single alternative (due to insufficient trust in the respective information sources), the aggregated statement resulting from the fusion of multiple
statements is typically uncertain. The result needs to reflect and weight the different information inputs appropriately, which typically leads to uncertainty."

This is the common in military applications and in general knowledge bases because one attempts to acquire supporting data for entities in the knowledge base from various sources that can be unreliable.

They briefly discuss the shortcomings in current traditional methods to handle uncertainty in ontologies. They state that since ontologies are designed to contain concepts and relations only that describe asserted facts about the world, that they are not designed to handle uncertainty. The facts asserted are assumed to be 'true'. Therefore even information that is not certain to be 'true' are stored and lead to errors or inaccurate information. There is not a standard way to handle uncertainty currently (can read more on this).

They discuss how probability is used as a way to represent uncertainty in ontologies. They discuss some existing work in this area including (BayesOWL). The problem with this approach in particular is that it does not account for OWL properties, instances of the ontologies or the data types. There are also extensions to DL (Pronto is one of them), however performance is a problem. There are also Fuzzy approaches that exist.

They then discuss using DS. DS is presented as a generalized probability theory, however books related to this topic are not exactly is agreement with this representation. Masses are calculated and the sum of these masses make up the beliefs. It is also noted that is supports combining evidence from different sources which makes it especially useful for fusing data from different sources. They note work that actually use DS to handle the inconsistencies produced by mapping ontologies. However it also highlights a relevant paper that translates an OWL taxonomy into a directed evidential network.

The rest of the paper discusses their approach and how they use DS for modeling and reasoning. The point they make about uncertainty and why probabilistic methods can't represent it accurately is p-methods do not represent the absence of information very well. One needs to specify prior and conditional probabilities. They argue this leads to a minimax error due to is nature of symmetry prior probability assignment (.5) when information is not available. With DS missing information is not applied unless obtained indirectly. It allows one to specify a degree of ignorance (some define this as an upper and lower bound). They find this property to be appealing.


Probabilistic approaches use singletons only where DS allows one to use composites in addition to singletons. This is powerful. With probability theory there is a relationship between an event and its negation, DS does not imply a relationship between an event and its negation, it only models beliefs associated with a class.

They mention an additional point that I find makes this approach appealing. DS provides a way to combine evidence from different sources. This makes it especially useful for fusion.

They state, "the evidence theory is much more flexible than the probability theory". This is a strong statement and I'm not sure if it is completely true based on other papers that show how both Bayesian and DS can produce similar results.

Ok the paper ends with their approach. They discuss their proposed model which is an upper ontology representing the uncertainty. A DS_Concept which is a subclass of OWL:Thing has a DS_Mass, DS_Belief, DS_Plausibility, and a DS_Source. The Uncertain_Concept represents a concept that is part of the set. There is an object property is_either which has a range of owl:Thing so that all instances can be used.

This paper isn't cited by anyone else but I think there are good ideas proposed here and I am using this paper in my 601 work.

Saturday, April 24, 2010

Papers To Read

http://www.dsto.defence.gov.au/publications/2563/DSTO-TR-1436.pdf
http://www.isif.org/fusion/proceedings/fusion03CD/special/s31.pdf
http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/pdf/tutorials_and_users_guides.pdf
http://www.autonlab.org/tutorials/bayesnet09.pdf
http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PSISDG004051000001000255000001&idtype=cvips&gifs=yes&ref=no
http://www2.research.att.com/~lunadong/publication/fusion_vldbTutorial.pdf
http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/Uncertainty
http://www.cs.cmu.edu/afs/cs/academic/class/15381-s07/www/slides/032207probAndUncertainty.pdf
cox's theorem
http://www.britannica.com/bps/additionalcontent/18/35136768/Data-Fusion-for-Traffic-Incident-Detection-Using-DS-Evidence-Theory-with-Probabilistic-SVMs
http://data.semanticweb.org/workshop/ursw/2009/paper/main/5/html
http://portal.acm.org/citation.cfm?id=1698790.1698821
http://volgenau.gmu.edu/~klaskey/papers/LaskeyCostaJanssen_POFusion.pdf
http://www.slideshare.net/rommelnc/ursw-2009-probabilistic-ontology-and-knowledge-fusion-for-procurement-fraud-detection-in-brazil
http://www.eurecom.fr/~troncy/Publications/Troncy_Straccia-eswc06.pdf
http://www.glennshafer.com/assets/downloads/articles/article48.pdf
http://www.fusion2008.org/tutorials/tutorial05.pdf
http://www.google.com/url?sa=t&source=web&ct=res&cd=5&ved=0CCcQFjAE&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.62.9835%26rep%3Drep1%26type%3Dpdf&rct=j&q=dempster+shafer+tutorial&ei=UprNS4m9K4P88Ab5v-GVAQ&usg=AFQjCNFUUt_xSeT2QOkrqYvsLySWOllqCw&sig2=RLqpJoODSs1kgLFK869Ikw
http://www.cs.cf.ac.uk/Dave/AI2/node87.html
http://www.autonlab.org/tutorials/bayesnet09.pdf
http://sinbad2.ujaen.es/sinbad2/files/publicaciones/186.pdf
http://www.ensieta.fr/belief2010/papers/p133.pdf
http://www.gimac.uma.es/ipmu08/proceedings/papers/057-MerigoCasanovas.pdf
http://www.sas.upenn.edu/~baron/journal/jdm7803.pdf
http://classifier4j.sourceforge.net/usage.html
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-527/paper1.pdf

Research Paper Summary - A General Data Fusion Architecture

H Carvalho, W Heinzelman, A Murphy, and C Coelho. A
general data fusion architecture. In Int. Conf. on Info. Fusion,
pages 1465–1472, 2003.


This is a short paper that describes an architecture for data fusion. What they are proposing is a taxonomy that defines 3 types of fusion: data oriented, variable oriented, and a mixture of the two. They are making a clear distinction between data as a measurement of the environment and variable as determined by feature extraction.

They describe examples of sensor data and state that the data needs to be pre-processed before fused. The pre-processing can involve conversions of a signal or filtering or handling noise. After pre-processing the data can be fused and they are proposing a 3-level data fusion framework. They begin by classifying the data as defined by the taxonomy. Basically when the fusion occurs defines what type of fusion we are dealing with (data, variable or mixture).

They go into a few examples of using this architecture. In general, the paper is not detailed enough to understand if the approach is viable. It is high level and short. It does provide additional information about the formalities of data fusion which is useful.

Paper Summary - A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence

Al-Ani, A. & Deriche, M. (2002) A new technique for combining multiple classifiers using the dempster shafer theory of evidence. In Journal of Artificial Intelligence Research, 17, (pp. 333—361)

This paper describes a new technique based on Dempster-Shafer to combine classifiers. The basic premise is that different types of features may be used depending upon the application. With different features, the same classifier may not always be best. So based on the features, a different classifier may outperform others. By combining classifiers they propose this is an efficient way to achieve the best classification results.

There are two problems defined by others, how to determine which classifiers to use and how to combine the classifier results to get the best results. They are addressing the second question in this paper.


They categorize the output of classification algorithms into 3 levels:

  • the abstract - outputs a unique label

  • the rank - ranks all labels with label at top as first choice

  • the measurement levels - attributes to each class a value reflecting degree of confidence that input belongs to class



They state the measurement level contains the 'highest amount of information' and they use this level for their work.

Two combination scenarios mentioned:

  • all use the same representation of input pattern

  • each uses its own representation



Relating to item 2 they found from another study that using a joint probability distribution using the sum rule gave the best results. They also quote a study that used weighted sums, and another that used a cost function to minimize MSE in conjunction with A NN. In this same study that used NN, a number of NNs were used to produce linear combination. Combining the results from the NN, they used Dempster-Shafer theory. They give a few other approaches and then the rest of the paper discusses their approach.

They combine classifier results using a number of different feature sets. Each feature set is used to train a classifier. For some input x, each classifier will produce a vector that conveys the degree of confidence that the classifier has for each class given the input.

They then discuss DS. DS is said to represent uncertainties better than probabilistic techniques such as Bayesian. For classifier combination, they stress this is important since there usually exists "a certain level of uncertainty associated with the performance of each classifier". Other classifier combination methods that use DS theory do not accurately estimate the evidence of classifiers, they state and believe that their approach which uses gradient descent learning minimizes the MSE between the combined output and target output of the training set.

They then go into detail about the math behind DS and about their approach.

Note DS Belief and Plausibility formulas from wikipedia:

Belief:

Plausibility:

Note DS Rule of combination:


where:



I need to return to this to describe their method. It is detailed and involves a lot of math.

Why am I reviewing this document. Well, this is a little off topic but I thought any exposure to methods that use DS will help me understand it better.

Paper Summary - An Introduction to Multisensor Data Fusion

D. L. Hall and J. Llinas, editors. Handbook of Multisensor
Data Fusion. CRC Press, 2001.

This paper is based on the book and gives a general background in multisensor data fusion. It gives basic definitions, a bit of history and highlights types of applications that use multisensor data fusion techniques. This is most prevalent in military applications but commercial applications are also making use of fusing data from multiple sources. There are advantages in using a 'multi-sensor' approach, improved accuracy and estimates are better and in general there is a statistical advantage.

It goes on to provide some basic definitions and discusses examples of sensors (more related to military domain). What is interesting is the following:

"The most fundamental characterization of data fusion involves a hierarchical transformation between observed energy or parameters (provided by multiple sources as input) and a decision or inference (produced by fusion estimation and/or inference processes) regarding the location, characteristics, and identity of an entity, and an interpretation of the observed entity in the context of a
surrounding environment and relationships to other entities....
The transformation between observed energy or parameters and a decision or
inference proceeds from an observed signal to progressively more abstract concepts."

They go on to discuss methods which are to make identity estimations including Dempster-Shafer and Bayesian.

"Observational data may be combined, or fused, at a variety of levels from the raw data (or observation) level to a state vector level, or at the decision level."

It then talks in details about examples in military and non-military applications and then about the Joint Data Fusion Process Model which was established in 1986.

The rest of the paper goes into detail about the architecture.

Why is this important to my work?

There are aspects about true multi-sensor data fusion that can be adapted and used in fusing semantic web data. There are very similar issues involved. We get data about entities from different sources. This data can be complementary, certain sources can offer facts that other sources are not aware of and fusing this information together presents a more comprehensive picture of an entity. This is applicable which smushing FOAF instances (part of earlier work) and this is applicable which simply merging fact retrieved from different sources. One example in particular is new sources. Different facts can be exposed from different news sources. When you bring these facts together you get a more complete story.

This brings us to another issue with data fusion and that is conflict resolution. When we are combining sources sometimes the information can be in conflict with each other. This is an interesting problem.

This paper is a great way to get a good background in multi-sensor data fusion and one can use definitions, techniques, architectures and apply them to fusing Semantic Web data.

Tuesday, April 20, 2010

Peers

"Travel only with thy equals or thy betters; if there are none, travel alone." --The Dhammapada

Sunday, April 18, 2010

Commitment

"Until one is committed, there is hesitancy, the chance to draw back, always ineffectiveness. Concerning all acts of initiative and creation, there is one elementary truth, the ignorance of which kills countless ideas and splendid plans: that the moment one definitely commits oneself, then providence moves too. All sorts of things occur to help one that would never otherwise have occurred. A whole stream of events issues from the decision, raising in one’s favor all manner of unforeseen incidents and meetings and material assistance, which no man could have dreamed would have come his way. Whatever you do, or dream you can, begin it. Boldness has genius, power and magic in it. Begin it now." --Johann Wolfgang von Goethe (1749- 1832)

Saturday, April 17, 2010

Paper Summary - An Introduction to Bayesian and Dempster-Shafer Data Fusion

D. Koks and S. Challa, DSTO Systems Sciences Laboratory, November 2005
http://www.dsto.defence.gov.au/publications/2563/DSTO-TR-1436.pdf

This paper is about data fusion and using techniques such as Bayesian and DS. It starts out with an introduction about data fusion and how it is defined in multiple domains. It then highlights work by others that implemented different methods to perform data fusion. It then gives nice detailed examples of using Bayesian and Dempster-Shafer to perform data fusion. It ends with a comparison summary of these two techniques.

The paper is very good. It shows all of the equations step by step and gives clear examples. It very clearly shows shortcoming of both methods.

Notes on XMPP Prototype

I've been recently investigating XMPP for Cloud Computing. After reading the specifications and learning basically what it is all about, I've begun working on a prototype that would accomplish similar tasks that I accomplish with mesh4x . Basically the goal is offline/online synchronization. So far, I've played around with Jabber, tigase, and a couple of other servers. I need basically server and client support. So far, I like OpenFire and Smack. Right now I have code written in Java that basically creates a chat between two users. I created two accounts at jabber.iitsp.com and wrote some basic code. Now, my two clients are communicating by chatting with one another.

My next steps:
  1. Use OpenFire locally as the server
  2. Write code that will do what I do with mesh4x
  3. Hook it into Google Maps



Basic code:

XMPPConnection connection = new XMPPConnection("jabber.iitsp.com");
connection.connect();
connection.login(userName, password);
Chat chat = connection.getChatManager().createChat(person, new MessageListener() {

public void processMessage(Chat chat, Message message) {
System.out.println("Received message: " + message.getBody());
}
});

chat.sendMessage("Hi there!");

I do a bit more in my code to keep the thread running and to recognize who the user is and respond with a customized message but this is the code that actually sends and receives the message. This is taken from the Smack documentation as a basic test. If you want to just test it one way, create an account at Trillian using XMPP as the protocol with one of your user accounts: user1@jabber.iitsp.com. Then set up your Java code as user2 who wants to send to user1. Your trillian client will pop up your message.

Monday, April 5, 2010

Paper Summary - Ontology matching: A machine learning approach

A. Doan and J. Madhaven and P. Domingos and A. Halevy, "Ontology matching: A machine learning approach", Handbook on Ontologies in Information Systems,2004,397-416, Springer-Verlag


This paper is about finding mappings between ontologies and discusses the system GLUE. Using learning techniques it semi-automatic generates mappings. This work attempts to address the issue of matching concept nodes.

It begins by discussing the meaning of similarity and a discussion around using the joint probability distribution of concepts.

It then discusses the complexities of computing the JPD for two different concepts.

It then discusses using machine learning to use instances of one concept to learn a classifier for that concept and the same for the second concept.

Rather than a single algorithm is uses multiple algorithms and then combines their predictions.

This paper is long. Their techniques are novel and interesting. This is a good paper to use for machine learning techniques for instance matching.

Saturday, April 3, 2010

Paper Summary - Probabilistic relational models

D. Koller. Probabilistic relational models. In S. Dzeroski and P. Flach, editors, Proceedings of Ninth International Workschop on Inductive Logic Programming (ILP-
1999). Springer, 1999.


This paper highlights deficiencies that exist with Bayesian networks; mainly that BNs cannot represent complex domains because they cannot represent models of domains that they are not aware of in advance. They present probabilistic relational models as a language for describing probabilistic models. Entities, their properties and relations are represented with the language.

The key points in this paper are:


  • BNs are very useful and have been successful as a way to perform probabilistic reasoning
  • BNs are inadequate for representing large complex domains

  • BNs lack the concept of an object and therefore there is no concept of similarity among objects across contexts

  • Probabilistic relational models extend BNs by adding concepts of individuals, properties and relations




Objects are the 'basic entities' in a PRM and partitioned into disjoint classes with a set of associated attributes. In addition to objects, relations also make up the vocabulary. It states that the goal of the PRM is to define a probability distribution over a set of instances for a schema.

The key distinctions here with PRMs and Bayesian networks are PRMs define the dependency model at the class level and they use the relational structure of the model. They are more expressive then Bayesian networks.

Regarding inference, they show how this expressiveness helps rather than further complicates the processing.

The paper is dense but interesting.