Thursday, December 8, 2016

Our new paper

Monday I presented the work we have been doing for over 1 year. We are using dynamic topic modeling and cross-domain correlations to understand how climate change research is influencing the Intergovernmental Panel for Climate Change Assessments.


Dynamic Topic Modeling to Infer the Influence of Research Citations on IPCC Assessment Reports


This work is just getting started, it took us a year to get the IPCC documents and citations parsed and processed, the climate change glossaries built, the preprocessing steps to get the best out of the topic modeling, and parameter tweaking. Now we are going to start large scale experimentation.


Thanks to the best advisors ever, Dr. Finin, Dr. Halem and thanks to Dr. Cane, who is a great scientist to work with. It is a honor to work with such great people.

Tuesday, April 12, 2016

Nice GitHub Deep Learning Summary and Notes

I can't say enough about the list of resources here.

Friday, April 1, 2016

Image Thresholding in Python

I found this article to be very useful.

http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html

Distilling the Knowledge in a Neural Network

http://arxiv.org/abs/1503.02531 

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

Data Exploration with Kaggle Scripts, Data Science, Data Exploratory Courses

This might be interesting at a surface level.  I haven't evaluated yet.

Data Exploration with Kaggle Scripts course.


Again more surface level stuff.


Intermediate Python for Data Science course.


This actually might have more substance, it is taught by a JHU professor.

Coursera course on Exploratory Data Analysis


Thursday, March 31, 2016

A Survey of Graph Theory and Applications in Neo4J - Talk

This is a link to a talk given at a recent meet-up in Arlington, VA.

The talk starts out with pretty introductory material but as it progresses it gets more interesting.  Definitely worth a read during a treadmill session.

Here is another relevant link.

My opinion of Neo4J after using for 1 year for experimental purposes is that it is a decent application but I highly doubt its scalability for big data.  I never tested this but it is a hunch based on my use.

Also if you are using Neo4J to store triples, no, don't do that, it is way too much work.  Just use a triple store.

Monday, March 28, 2016

CS231n: Convolutional Neural Networks for Visual Recognition Winter Course Project Report

There are lots of interesting reads on this page.   And this is a great course to take if you are research deep learning for image processing.

Tuesday, March 22, 2016

Sunday, March 20, 2016

3 Minute Thesis Competition - 3MT

Can you explain your dissertation in 3 minutes?

UMBC has a 3MT competition this Wednesday.  
 BALTIMORE, MD


If you are preparing for a 3MT, this is a good resource.



Other good 3MT videos:

2010 Trans-Tasman 3MT Winner - Balarka Banerjee from Three Minute Thesis (3MT®) on Vimeo.







Thursday, March 17, 2016

Markdown

I am starting to use markdown more.  For me, I wanted to know why I should care about markdown.  This article gives a good view of why to use and there is a link to a tutorial.

Read it here.

Tuesday, March 15, 2016

Flask

Flask...

"Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions."

I toyed around with the tutorial and was able to get a few simple apps running.  I suppose if you are interested in building web sites, this might be interesting to try.  I haven't determined if it is useful for anything else.

http://flask.pocoo.org/

Image Processing

Scipy package documentation:

http://www.scipy-lectures.org/packages/scikit-image/

Monday, March 14, 2016

Spring Break.....

I love working on campus during spring break.  Front space parking, empty lab, no line for coffee.....ahhh nirvana....


No coffee! Bah!

Thursday, March 10, 2016

Ugh, dissertation

In those moments when you are frustrated with your dissertation, breathe, and know there are others feeling the same pain.....

The valley....



Ride the wave to finish this thing...


Tuesday, March 8, 2016

DL4J

I have been using Java for a long time but I find DL4J to be a bit cumbersome to use.  I prefer Torch/Lua or Theano for deep learning.

However because Java has been such a significant part of my life for so long, I will not give up on DL4J.

More to come once I get this working.

In the meantime, here are a few links, so I can close those tabs:-)....

word2vec in DL4J
deep autoencoders in DL4J
nd4j

Whooo, Ahhh, ....


Nonlinear PCA for Matlab, fun stuff here!

I like popcorn and I like bag of words

I think I am going to like this too.

Beginners, may not be very useful, but they said popcorn, so they have my attention....

Data sets for Presidential Debates


I want to play with these data sets, I really do, why haven't I done this yet....

Get them here....

swirl (no not that one Sematic Webbers)

swirl is a way to learn R and data science interactively.

Try it out here.

Overlapping histograms and Histogram Tutorial in Python

A thread that talks about this topic.  Much easier to do in R.


More on histograms

matplotlib



Examples using matplotlib.


Tutorial for matplotlib.

A little bit on density plots.

And an introduction to plotting in Python.

knitr


knitr

For dynamic report generation in R.


Link for knitr

Monday, January 25, 2016

How to read a research paper

http://www.sciencemag.org/careers/2016/01/how-read-scientific-paper