週次 |
授課內容 |
第1週 |
[W1] (2/25星期六)
Introduction to data analysis and AI applications.
Several examples for simple (but useful) analysis with data.
Statistical and mathematical modeling.
Introduction to linear regression, logistic regression and unsupervised machine learning. |
第2週 |
[W2] (3/4星期六)
DATA: Spam email data [Project 1]
DATA: breast cancer and biomarker (case control study) [Project 1]
Statistical hypothesis testing; T test and F test (ANOVA).
Sample size calculation.
Linear regression with examples.
Pearson correlation and Spearman’s correlation; Kendall’s tau.
PCA analysis and linear discriminant analysis. |
第3週 |
[W3] Probability and distributions. Bayes theorem.
Parametric and nonparametric methods.
Rank-based Mann-Whitney and Wilcoxon procedures.
ROC curve analysis with examples. |
第4週 |
[W4] ROC curve versus logistic regression.
Unsupervised learning: K-means method; hierarchical cluster analysis and DIANA. Minimum entropy clustering. |
第5週 |
[W5] Presentation of Project 1.
Small topic_questionnaire data: internal reliability (Cronbach’s alpha) and inter-rater reliability (Cohen’s kappa) |
第6週 |
[W6] DATA: Taiwan PM2.5 data [Project 2]
DATA: currency of 19 country in 6 months [Project 2]
Bootstrapping.
Supervised learning: random forest method and gradient boosting.
Functional data (regularly sampled) analysis. |
第7週 |
[W7] DATA: alpine insect fauna; cluster analysis after data transformation
Special topic: Response surface with 2nd-order model and experimental design |
第8週 |
[W8] Generalized linear model (GLM): an introduction.
Longitudinal data analysis, Poisson regression.
Generalized estimating equation (GEE) model, robust inference and generalized linear mixed model (GLMM).
|
第9週 |
[W9] Presentation of Project 2.
Probability distributions and stochastic processes: a review
(including Markov chain, Brownianian motion and Brownian bridge process with applications in data analysis)
Assignment for online review: Introduction to the fundamentals of speech recognition.
|
第10週 |
[W10] DATA: Taiwan chickenpox and herpes zoster. [Project 3]
DATA: Lead-exposure workers. [Project 3]
Two by two contingency tables. Epidemiology study design; odds ratio and relative risk. Estimating common effect in multilevel/multicenter studies, conditional logistic regression and risk-set sampling. |
第11週 |
[W11] Matching, match-pair design, McNemar procedure. Propensity score matching (PSM). Counter-matching. Stratified analysis and interaction. |
第12週 |
[W12] DATA: Lung cancer data analysis (Project 3)
DATA: LOS (length of hospital stay) data analysis (Project 3)
Introduction to clinical (medical) data analysis and survival analysis.
Kaplan-Meier estimate, survival models, log-rank and weighted log-rank tests.
Weibull regression.
Cox proportional-hazards regression with applications.
|
第13週 |
[W13] Presentation of Project 3.
Panel discussion.
Generalized additive model (GAM)
Special topic: seeking maximal association for Y and Xs.
DATA: Taiwan air pollutant data versus health-insurance data bank (several diseases) |
第14週 |
[W14] Project 4 : proposals and discussion
General considerations for model-building sstrategy.
Large-P-small-N question.
|
第15週 |
[W15] Time-dependent clustering.
Spatial temporal data analysis and detection of spatial clustering.
DATA: eBird data and avian influenza outbreaks in poultry farms
DATA: Dengue outbreaks |
第16週 |
[W16] Functional clustering; noncentral chi-square, noncentral t, and noncentral F distributions; regression trees, CART, random forest (revisited)
DATA: Taiwan’s PM2.5 data
DATA: Ginseng (人蔘) 1H-NMR data |
第17週 |
[W17] Bayes inference, empirical Bayes method, Bayes multilevel modeling.
DATA: metabolomic (NMR spectroscopic) data of salmon smolts with integrated ANOVAs. |
第18週 |
[W18] Project 4: final presentation. |