Monday, December 17, 2012

Excellent List of Machine Learning Resources

A List of Data Science and Machine Learning Resources

Monday, September 3, 2012

PCA, SVD, LSA

Great links to learn and understand these concepts:
PCA Tutorial
SVD and PCA discussion
LSA Tutorial

Saturday, February 25, 2012

Mining Twitter Article (Not technical)

This is an interesting article http://isc.sans.edu/diary.html?storyid=5728 due to the fact that they have links that are interesting related to mining Twitter. It is a little outdated and a few of the links do not work but still semi-useful. Keep in mind data sets that were previously available, for example http://snap.stanford.edu/data/twitter7.html, are no longer available due to a Twitter request (read about it here). However you can still use the Twitter API to get data, you are just limited to the number of tweets you can get per day/session (can't remember). Also useful is this article.

Paper Summary - Short text classification in twitter to improve information filtering

Short text classification in twitter to improve information filtering, B. Sriram and D. Fuhry and E. Demir and H. Ferhatosmanoglu,2010
This paper describes research that classifies tweets using a reduced set of features. In this approach they try to classify text into the following set of classes "News, Events, Opinions, Deals, and Private Messages". The problem they present is the curse of dimensionality problem that results from trying to conquer the spareness issue related to classifying twitter messages. Other research typically uses external knowledge bases to support tweet classification. They argue this can be slow due to the need to excessively query the external knowledge base. Important points about this paper:
1. They provide a very useful discussion of Twitter and tweets
2. How they classify tweets is interesting worth another review

Latex Tip - Controlling placement of figures and tables

This was a useful article for controlling placement of figures and tables. http://robjhyndman.com/researchtips/latex-floats/

Paper Summary - Linking Social Networks on the Web with FOAF: A Semantic Web Case Study

Linking Social Networks on the Web with FOAF: A Semantic Web Case Study,J. Golbeck and M. Rothstein, 2008
Another paper related to FOAF and the Semantic Web. This paper examines linking FOAF identities across social networks. They propose that it would be better to merge these accounts across sites so a user has one social network. This may have been an interesting idea in 2008 but I'm not sure if this is relevant now seeing as some social networks let you bring in contacts from other sites. In general, this paper can be excluded from further analysis.

Resource - Writing and Presenting Your Thesis or Dissertation

Writing and Presenting Your Thesis or Dissertation, S. Joseph Levine, Ph.D. http://www.learnerassociates.net/dissthes/ Another useful resource for writing the dissertation. They have a section on writing the proposal which I found useful.

Paper Summary - Twitter Sentiment Classification using Distant Supervision

Twitter Sentiment Classification using Distant Supervision, A. Go and R. Bhayani and L. Huang, 2009, Technical report, Stanford Digital Library Technologies Project
This paper relates to classifying sentiment found on Twitter. They use machine learning and are able to construct training data by using the Twitter API and emoticons present among tweets. The standard :) and :( are used to determine if a tweet contains positive or negative sentiment. The key points are 1.) they picked an efficient way to construct their training sets, 2.) Tweets are harder to classify because their length can not exceed 140 characters, and 3.) their results were promising for classifying the sentiment of the tweets.

Friday, February 17, 2012

Data Mining Resources

This thread has a lot of useful links for resources to help one studying data mining. It is mainly to help one build the mathematical background.

Friday, February 10, 2012

Data Mining and Machine Learning

I wished to understand the distinction between data mining and machine learning. This presentation (Machine Learning and Data Mining: 01 Data Mining) is useful.

Tuesday, February 7, 2012

Khan Academy

If you need to review concepts from Calculus, Linear Algebra, or Probability.
This is a good resource:
Khan Academy
I used this to review Linear Algebra concepts for my Data Mining course. They have quite a few topics.

Cheers.

Wednesday, January 4, 2012

Advice Collection

I came across a useful collection of links for Ph.D. students. It offers dissertation advice, presentation advice, and more....I read some of these articles in the past but it is nice to have a central location for reference.