Skip to main content
Side panel
Home
More
Search
Close
Search
Toggle search input
English (en)
English (en)
Slovenščina (sl)
Македонски (mk)
Русский (ru)
한국어 (ko)
You are currently using guest access
Log in
Home
Course Activities
Forums
Resources
Schedulers
Recent Courses
You are not enrolled in any courses
Open course index
dm-hse
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
Click on
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
to open the resource.
◄ Ramos: Using TF-IDF to Determine Word Relevance in Document Queries
Jump to...
Jump to...
Announcements and Discussions
Much Further Reading
Office hours
Data for the third part
Exercise (visualizations)
Mushrooms
Exercise (insignificance of significance)
Orange and basic visualizations
Mosaic and Sieve diagram
Task solutions
Arguments against testing of null hypotheses
Of Carrots, Horses and the Fear of Heights
How to Abuse p-values in Correlations
Cohen (1994): The Earth is Round (p < 0.05)
Surviving on mushrooms
Recognizing types of animals
Animals
Exploring Human Development Index
Human development index (+ religions + continents)
Classification trees
Decision tree learning (Wikipedia) [mandatory read, but see remark]
Information Gain in Decision Trees (Wikipedia) [optional reading]
Induction of Decision Trees (Quinlan, 1986) [optional reading]
Scores for evaluation of models
mushroom-predictions
Sara's Hamsters
Sara's Hamsters - solution
Cross validation
Scores for evaluation of model performance
List of performance scores (Wikipedia)
Cross validation (Wikipedia) [optional]
An introduction to ROC analysis (Fawcett, 2006) [mandatory: first seven sections]
A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss [just the Introduction; optional]
Recognizing mushrooms - again
Mushrooms (numeric)
Decision boundaries
Linear models for classification
Logistic regression (Shalizi, 2012) [mandatory, see below which parts]
Nomograms for Visualization of Naive Bayesian Classifier (Možina, 2004) [mandatory, you may skip Section 2]
Nomograms for Linear Models
Exploration of Kernel Methods
Other models
A nice explanation of the kernel trick
Kernel Methods for Pattern Analysis (Shawe-Taylor, Christiannini, 2004) [optional, beyond this course]
Random Forests (Breiman, 2001) [optional]
The Random Subspace Method for Constructing Decision Forests (Ho, 1998) [optional]
Regularization Experiment
Regularization
Elements of Statistical Learning [optional, way beyond this course]
Clustering versus Classification
Exploration of linkage functions
Data sets for clustering
Exploration of Dendrograms
Exploration of Clusters
Clustering (part 1: k-means and hierarchical clustering)
Clustering (part 2: linkages, distances)
Introduction to Data Mining, Chapter 8: Cluster Analysis: Basic Concepts and Algorithms (Tan P-N, Kumar, 2006)
Fake news
Text Mining
Text Mining - In-class assignment
Allahyari et al. - A Brief Survey of Text Mining
Bird and Klein: Regular Expressions for Natural Language Processing
Ramos: Using TF-IDF to Determine Word Relevance in Document Queries
Text Mining course notes
Liu, Bing: Sentiment Analysis and Opinion Mining
Projections
Deep learning and images
Distances
Analysis of Multivariate Social Science Data, Chapter 3: Multidimensional Scaling (Bartholomew, 2008) [recommended]
FreeViz—An intelligent multivariate visualization approach to explorative analysis of biomedical data (Demšar, 2007) [optional]
animals and fruits
Assignment: ROC Curve
Assignment: Regression
Assignment: Classifiers and their Decision Boundaries
Solution: Classification boundaries
Solution: ROC curve
Solution: Regression
Text Mining course notes ►