Welcome! to the Freund Lab Wiki
For help on using this wiki, consult Help
Resources for machine learning Beginners
- A series of iPython notebooks that demonstrate scikit-learn
- An online course on machine learning by Andrew NG
- A course on statistical learning by Trevor Hastie and Rob Tibshirani
- Machine Learning in Action
Resources for the theory behind big data computation
Books from the foundations and trends in theoretical computer science (TCS)
- Data Streams / S. Muthukrishnan 2005
- Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches / Graham Cormode, Minos Garofalakis, Peter J. Haas, Chris Jermaine 2011
- Algorithms and Data Structures for External Memory / Jeffrey Scott Vitter 2008
A textbook used in a course given in Stanford:
- See the Tensors page for a reading list.
Linear systems and lossy compression
- A course on dynamic linear systems / Stephen Boyd, Stanford
- Time series analysis and it's applications (with R examples)/ Robert H. Shumway, David S. Stoffer
- Anova, regression and Logistic Regression
- Sigma-Delta Modulation
- DPCM: Lossy Predictive Coding
- demonstration of Socher's recursive NN for sentiment analysis
- etcML - machine learning for classifying tweets.
- Noah Smith, Noah's work on NLP for twitter (I think this is where we got the twitter parser, but maybe not the latest version). Look under "Twitter Word Clusters" to see an interesting clustering of words into some 1000 clusters.
- SemEval a yearly competition on semantic analysis. This year there is a track for sentiment analysis of Tweets.
Diaries of Past Students
Mouse Brain Atlas
- Mouse Brain Atlas Building System: For discussions related to the overall registration and atlas generation system.
- Texture Analysis: For discussions related to characterizing cellular texture using different methods (Deep neural network, dictionary method, cell-based method etc.).
Dimensionality Estimation project
- Github repository:
- Dimensionality Estimation Diary
Analysis of Bee dances
Remote Browsing of large Images and large Data (LILD)
Statistical models for network communication
Projects for master's students
- Collaborative Tweet Filtering
- Analysis of energy feeds
- CAIDA internet analysis
- U.S. Census Currently down, might be because of the government shut-down, I have a recent snapshot of the data on Gordon.
- Weather history
- Data.gov Down
- world-bank data
- California Data
- AWS Public Dataset from Amazon
- Pointers from K. Claffy in CAIDA:  and  both provide public BGP data at a variety of granularities including raw updates.
- The Yelp Dataset Challange: a dataset from the Phoenix area.
iPython notebook collections
- Translate webwork problems to edX.