課程名稱 (中) 數據分析與AI應用(7627)
(Eng.) Application in Data Analysis and Artificial Intelligence
開課單位 人工智慧資科學程
課程類別 選修 學分 3 授課教師 吳宏達
選課單位 人工智慧資科學程 / 碩專班 授課使用語言 中文 英文/EMI 開課學期 1112
課程簡述 數據分析是各領域科學都會遭遇的主題,當數據分析伴以統計與機率模型時,可以有更客觀可信的結果詮釋。學習如何藉由簡單的統計(包括機率)模型對一組不論多大的數據做分析,並且建立合理的解釋模型,是現代分析者邁向AI世紀必須具備的基本知識和能力。
第1週 [W1] (2/25星期六)
Introduction to data analysis and AI applications.
Several examples for simple (but useful) analysis with data.
Statistical and mathematical modeling.
Introduction to linear regression, logistic regression and unsupervised machine learning.
第2週 [W2] (3/4星期六)
DATA: Spam email data [Project 1]
DATA: breast cancer and biomarker (case control study) [Project 1]
Statistical hypothesis testing; T test and F test (ANOVA).
Sample size calculation.
Linear regression with examples.
Pearson correlation and Spearman’s correlation; Kendall’s tau.
PCA analysis and linear discriminant analysis.
第3週 [W3] Probability and distributions. Bayes theorem.
Parametric and nonparametric methods.
Rank-based Mann-Whitney and Wilcoxon procedures.
ROC curve analysis with examples.
第4週 [W4] ROC curve versus logistic regression.
Unsupervised learning: K-means method; hierarchical cluster analysis and DIANA. Minimum entropy clustering.
第5週 [W5] Presentation of Project 1.
Small topic_questionnaire data: internal reliability (Cronbach’s alpha) and inter-rater reliability (Cohen’s kappa)
第6週 [W6] DATA: Taiwan PM2.5 data [Project 2]
DATA: currency of 19 country in 6 months [Project 2]
Supervised learning: random forest method and gradient boosting.
Functional data (regularly sampled) analysis.
第7週 [W7] DATA: alpine insect fauna; cluster analysis after data transformation
Special topic: Response surface with 2nd-order model and experimental design
第8週 [W8] Generalized linear model (GLM): an introduction.
Longitudinal data analysis, Poisson regression.
Generalized estimating equation (GEE) model, robust inference and generalized linear mixed model (GLMM).
第9週 [W9] Presentation of Project 2.
Probability distributions and stochastic processes: a review
(including Markov chain, Brownianian motion and Brownian bridge process with applications in data analysis)
Assignment for online review: Introduction to the fundamentals of speech recognition.

第10週 [W10] DATA: Taiwan chickenpox and herpes zoster. [Project 3]
DATA: Lead-exposure workers. [Project 3]
Two by two contingency tables. Epidemiology study design; odds ratio and relative risk. Estimating common effect in multilevel/multicenter studies, conditional logistic regression and risk-set sampling.
第11週 [W11] Matching, match-pair design, McNemar procedure. Propensity score matching (PSM). Counter-matching. Stratified analysis and interaction.
第12週 [W12] DATA: Lung cancer data analysis (Project 3)
DATA: LOS (length of hospital stay) data analysis (Project 3)
Introduction to clinical (medical) data analysis and survival analysis.
Kaplan-Meier estimate, survival models, log-rank and weighted log-rank tests.
Weibull regression.
Cox proportional-hazards regression with applications.

第13週 [W13] Presentation of Project 3.
Panel discussion.
Generalized additive model (GAM)
Special topic: seeking maximal association for Y and Xs.
DATA: Taiwan air pollutant data versus health-insurance data bank (several diseases)
第14週 [W14] Project 4 : proposals and discussion
General considerations for model-building sstrategy.
Large-P-small-N question.

第15週 [W15] Time-dependent clustering.
Spatial temporal data analysis and detection of spatial clustering.
DATA: eBird data and avian influenza outbreaks in poultry farms
DATA: Dengue outbreaks
第16週 [W16] Functional clustering; noncentral chi-square, noncentral t, and noncentral F distributions; regression trees, CART, random forest (revisited)
DATA: Taiwan’s PM2.5 data
DATA: Ginseng (人蔘) 1H-NMR data
第17週 [W17] Bayes inference, empirical Bayes method, Bayes multilevel modeling.
DATA: metabolomic (NMR spectroscopic) data of salmon smolts with integrated ANOVAs.
第18週 [W18] Project 4: final presentation.
1. The elements of statistical learning, 2nd ed. By T. Hastie, R. Tibshirani, and J. Friedman. (2009) Springer.
2. Computer age statistical inference. By B. Efron and T. Hastie. (2016) Cambridge.
