주요 내용으로 넘어가기
Side panel
홈
More
검색
닫기
검색
Toggle search input
한국어 (ko)
English (en)
Slovenščina (sl)
Македонски (mk)
Русский (ru)
한국어 (ko)
손님 계정으로 접속
로그인
홈
Course Activities
포럼모음
학습자료
Schedulers
Recent Courses
You are not enrolled in any courses
Open course index
dm-hse
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
자원을 열려면
An opinion word lexicon and a training dataset for Russian sentiment analysis of social media
링크를 클릭
◄ Ramos: Using TF-IDF to Determine Word Relevance in Document Queries
..로 바로 가기
..로 바로 가기
Announcements and Discussions
Much Further Reading
Office hours
Data for the third part
Exercise (visualizations)
Mushrooms
Exercise (insignificance of significance)
Orange and basic visualizations
Mosaic and Sieve diagram
Task solutions
Arguments against testing of null hypotheses
Of Carrots, Horses and the Fear of Heights
How to Abuse p-values in Correlations
Cohen (1994): The Earth is Round (p < 0.05)
Surviving on mushrooms
Recognizing types of animals
Animals
Exploring Human Development Index
Human development index (+ religions + continents)
Classification trees
Decision tree learning (Wikipedia) [mandatory read, but see remark]
Information Gain in Decision Trees (Wikipedia) [optional reading]
Induction of Decision Trees (Quinlan, 1986) [optional reading]
Scores for evaluation of models
mushroom-predictions
Sara's Hamsters
Sara's Hamsters - solution
Cross validation
Scores for evaluation of model performance
List of performance scores (Wikipedia)
Cross validation (Wikipedia) [optional]
An introduction to ROC analysis (Fawcett, 2006) [mandatory: first seven sections]
A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss [just the Introduction; optional]
Recognizing mushrooms - again
Mushrooms (numeric)
Decision boundaries
Linear models for classification
Logistic regression (Shalizi, 2012) [mandatory, see below which parts]
Nomograms for Visualization of Naive Bayesian Classifier (Možina, 2004) [mandatory, you may skip Section 2]
Nomograms for Linear Models
Exploration of Kernel Methods
Other models
A nice explanation of the kernel trick
Kernel Methods for Pattern Analysis (Shawe-Taylor, Christiannini, 2004) [optional, beyond this course]
Random Forests (Breiman, 2001) [optional]
The Random Subspace Method for Constructing Decision Forests (Ho, 1998) [optional]
Regularization Experiment
Regularization
Elements of Statistical Learning [optional, way beyond this course]
Clustering versus Classification
Exploration of linkage functions
Data sets for clustering
Exploration of Dendrograms
Exploration of Clusters
Clustering (part 1: k-means and hierarchical clustering)
Clustering (part 2: linkages, distances)
Introduction to Data Mining, Chapter 8: Cluster Analysis: Basic Concepts and Algorithms (Tan P-N, Kumar, 2006)
Fake news
Text Mining
Text Mining - In-class assignment
Allahyari et al. - A Brief Survey of Text Mining
Bird and Klein: Regular Expressions for Natural Language Processing
Ramos: Using TF-IDF to Determine Word Relevance in Document Queries
Text Mining course notes
Liu, Bing: Sentiment Analysis and Opinion Mining
Projections
Deep learning and images
Distances
Analysis of Multivariate Social Science Data, Chapter 3: Multidimensional Scaling (Bartholomew, 2008) [recommended]
FreeViz—An intelligent multivariate visualization approach to explorative analysis of biomedical data (Demšar, 2007) [optional]
animals and fruits
Assignment: ROC Curve
Assignment: Regression
Assignment: Classifiers and their Decision Boundaries
Solution: Classification boundaries
Solution: ROC curve
Solution: Regression
Text Mining course notes ►