Saturday, February 25, 2012
Mining Twitter Article (Not technical)
This is an interesting article http://isc.sans.edu/diary.html?storyid=5728 due to the fact that they have links that are interesting related to mining Twitter. It is a little outdated and a few of the links do not work but still semi-useful. Keep in mind data sets that were previously available, for example http://snap.stanford.edu/data/twitter7.html, are no longer available due to a Twitter request (read about it here). However you can still use the Twitter API to get data, you are just limited to the number of tweets you can get per day/session (can't remember).
Also useful is this article.
Paper Summary - Short text classification in twitter to improve information filtering
Short text classification in twitter to improve information filtering, B. Sriram and D. Fuhry and E. Demir and H. Ferhatosmanoglu,2010
This paper describes research that classifies tweets using a reduced set of features. In this approach they try to classify text into the following set of classes "News, Events, Opinions, Deals, and Private Messages". The problem they present is the curse of dimensionality problem that results from trying to conquer the spareness issue related to classifying twitter messages. Other research typically uses external knowledge bases to support tweet classification. They argue this can be slow due to the need to excessively query the external knowledge base. Important points about this paper:
1. They provide a very useful discussion of Twitter and tweets
2. How they classify tweets is interesting worth another review
This paper describes research that classifies tweets using a reduced set of features. In this approach they try to classify text into the following set of classes "News, Events, Opinions, Deals, and Private Messages". The problem they present is the curse of dimensionality problem that results from trying to conquer the spareness issue related to classifying twitter messages. Other research typically uses external knowledge bases to support tweet classification. They argue this can be slow due to the need to excessively query the external knowledge base. Important points about this paper:
1. They provide a very useful discussion of Twitter and tweets
2. How they classify tweets is interesting worth another review
Latex Tip - Controlling placement of figures and tables
This was a useful article for controlling placement of figures and tables.
http://robjhyndman.com/researchtips/latex-floats/
Paper Summary - Linking Social Networks on the Web with FOAF: A Semantic Web Case Study
Linking Social Networks on the Web with FOAF: A Semantic Web Case Study,J. Golbeck and M. Rothstein, 2008
Another paper related to FOAF and the Semantic Web. This paper examines linking FOAF identities across social networks. They propose that it would be better to merge these accounts across sites so a user has one social network. This may have been an interesting idea in 2008 but I'm not sure if this is relevant now seeing as some social networks let you bring in contacts from other sites. In general, this paper can be excluded from further analysis.
Another paper related to FOAF and the Semantic Web. This paper examines linking FOAF identities across social networks. They propose that it would be better to merge these accounts across sites so a user has one social network. This may have been an interesting idea in 2008 but I'm not sure if this is relevant now seeing as some social networks let you bring in contacts from other sites. In general, this paper can be excluded from further analysis.
Labels:
FOAF,
Research Paper Summaries,
Semantic Web,
Social Networks
Resource - Writing and Presenting Your Thesis or Dissertation
Writing and Presenting Your Thesis or Dissertation, S. Joseph Levine, Ph.D.
http://www.learnerassociates.net/dissthes/
Another useful resource for writing the dissertation. They have a section on writing the proposal which I found useful.
Paper Summary - Twitter Sentiment Classification using Distant Supervision
Twitter Sentiment Classification using Distant Supervision, A. Go and R. Bhayani and L. Huang, 2009, Technical report, Stanford Digital Library Technologies Project
This paper relates to classifying sentiment found on Twitter. They use machine learning and are able to construct training data by using the Twitter API and emoticons present among tweets. The standard :) and :( are used to determine if a tweet contains positive or negative sentiment. The key points are 1.) they picked an efficient way to construct their training sets, 2.) Tweets are harder to classify because their length can not exceed 140 characters, and 3.) their results were promising for classifying the sentiment of the tweets.
This paper relates to classifying sentiment found on Twitter. They use machine learning and are able to construct training data by using the Twitter API and emoticons present among tweets. The standard :) and :( are used to determine if a tweet contains positive or negative sentiment. The key points are 1.) they picked an efficient way to construct their training sets, 2.) Tweets are harder to classify because their length can not exceed 140 characters, and 3.) their results were promising for classifying the sentiment of the tweets.
Friday, February 17, 2012
Data Mining Resources
This thread has a lot of useful links for resources to help one studying data mining. It is mainly to help one build the mathematical background.
Friday, February 10, 2012
Data Mining and Machine Learning
I wished to understand the distinction between data mining and machine learning. This presentation (Machine Learning and Data Mining: 01 Data Mining) is useful.
Tuesday, February 7, 2012
Khan Academy
If you need to review concepts from Calculus, Linear Algebra, or Probability.
This is a good resource:
Khan Academy
I used this to review Linear Algebra concepts for my Data Mining course. They have quite a few topics.
Cheers.
This is a good resource:
Khan Academy
I used this to review Linear Algebra concepts for my Data Mining course. They have quite a few topics.
Cheers.
Subscribe to:
Posts (Atom)