5 Sessions of about 2-3 hours each. Session 1: working with text Part 1 (2.5 hours) 01A Vector Space Model - pre-processing, text similarity, tf.idf weighting (26 slides) 01B Text Indexing and Search - inverted indices, encodings, precision/recall (27 slides) Part 2 (2 hours) 02A Information Extraction - rule-based, wrapper induction, HMM for info extraction (34 slides) 02B Text Summarization - multi-document summarization, update summarization (34 slides) Session 2: text classification Part 1 (1.5 hours) 03A Supervised Learning - basics, KNN, decision trees (29 slides) 03B Text Classification - problems, practical aspects, evaluation (13 slides) 03C Sentiment Analysis - polarity, lexical resources, practical aspects (17 slides) Part 2 (1.5 hours -- because students were familiar with the subject) 03D Graph models - models, preferential attachment, evolving graphs, assortativity (46 slides) 03E Link-based ranking - hits, PageRank, other centrality metrics (48 slides) Session 3: working with graphs Part 1 (1.5 hour) 04A Dense sub-graphs - sub-graphs, cuts, max-flow, shingling (69 slides) Part 2 (1 hour) 04B Graph partitioning - projections, spectral methods (44 slides) Part 3 (1 hour) 04C Social influence models - model, linear threshold, independent cascade (45 slides) Session 4: social influence Part 1 (1.5 hours) 04D Influence maximization - problem, greedy approximation (23 slides) 05A Social media mining examples - economics, policits, health, crises (52 slides) Part 2 (1.5 hours) 05B Natural experiments - randomized controlled experiments, Neyman's model (45 slides) 05C Matching studies - matching and propensity matching design (51 slides) Optional Sessions (if time allows) extra_A Clustering - partitional algorithms, 1D clustering (22 slides) extra_B K-means - and k-means++, k-means-- (31 slides) extra_C Hierarchical Clustering - algorithm, variants (33 slides) extra_D Link prediction extra E Link prediction and contents extra F Ethics of handling social data - privacy, discrimination, experimentation (46 slides)