NCHU Course Outline
Course Name (中) 數據分析與AI應用(7769)
(Eng.) Application in Data Analysis and Artificial Intelligence
Offering Dept Executive Master Program in Big Data
Course Type Elective Credits 3 Teacher WU, HONG-DAR
Department Executive Master Program in Big Data / (W)Graduate Language Chinese Semester 2026-SPRING
Course Description 數據分析是各領域科學都會遭遇的主題,當數據分析伴以統計與機率模型時,可以有更客觀可信的結果詮釋。學習如何藉由簡單的統計(包括機率)模型對一組不論多大的數據做分析,並且建立合理的解釋模型,是現代分析者邁向AI世紀必須具備的基本知識和能力。
Prerequisites
Relevance of Course Objectives and Core Learning Outcomes(%) Teaching and Assessment Methods for Course Objectives
Course Objectives Competency Indicators Ratio(%) Teaching Methods Assessment Methods
藉由多組資料的完整分析並上台報告,我們從最簡單的統計方法開始,逐一透過數據的分析介紹統計模型。這些模型最後都在AI應用裡具有依定程度的重要性。我們尤其側重與AI相關的統計方法、數據分析思維、以及實例演練。
3.Professional Knowledge in Data Science
4.Mathematical and Statistical Software Skills
50
50
Exercises
Discussion
Practicum
Lecturing
Attendance
Oral Presentation
Assignment
Study Outcome
Course Content and Homework/Schedule/Tests Schedule
Week Course Content
Week 1 [W1]
Introduction to data analysis and AI applications.
Several examples for simple (but useful) analysis with data.
Statistical and mathematical modeling.
Introduction to linear regression, logistic regression and unsupervised machine learning.
Week 2 [W2]
DATA: Spam email data [Project 1]
DATA: breast cancer and biomarker (case control study) [Project 1]
Statistical sample size calculation.
Linear regression with examples.
Logistic regression and unsupervised machine learning.
Pearson correlation and Spearman’s correlation; Kendall’s tau.
PCA analysis and linear discriminant analysis.
Week 3 [W3] Probability and distributions. Bayes theorem.
Parametric and nonparametric methods.
Rank-based Mann-Whitney and Wilcoxon procedures.
ROC curve analysis with examples.
Week 4 [W4] ROC curve versus logistic regression.
Unsupervised learning: K-means method; hierarchical cluster analysis and DIANA.
Minimum entropy clustering.
Bootstrapping and Random Forest (RF) clustering.
Week 5 [W5] Presentation of Project 1.
Small topic_questionnaire data: internal reliability (Cronbach’s alpha) and inter-rater reliability (Cohen’s kappa)
Week 6 [W6] DATA: Taiwan PM2.5 data [Project 2]
DATA: currency of 19 country in 6 months [Project 2]
Gradient Boosting and AdaBoosting.
Functional data (regularly sampled) analysis.
Week 7 [W7] DATA: alpine insect fauna; cluster analysis after data transformation
Special topic: Response surface with 2nd-order model and experimental design
Week 8 [W8] Generalized linear model (GLM): an introduction.
Longitudinal data analysis, Poisson regression.
Generalized estimating equation (GEE) model, robust inference and generalized linear mixed model (GLMM).
Week 9 [W9] Presentation of Project 2.
Probability distributions and stochastic processes: a review
(including Markov chain, Brownianian motion and Brownian bridge process with applications in data analysis)
Assignment for online review: Introduction to the fundamentals of speech recognition.
Week 10 [W10] DATA: Taiwan chickenpox and herpes zoster. [Project 3]
DATA: Lead-exposure workers. [Project 3]
Two by two contingency tables. Epidemiology study design; odds ratio and relative risk. Estimating common effect in multilevel/multicenter studies, conditional logistic regression and risk-set sampling.
Week 11 [W11] Matching, match-pair design, McNemar procedure. Propensity score matching (PSM). Counter-matching. Stratified analysis and interaction. Basic ideas of modeling.
Week 12 [W12] DATA: Lung cancer data analysis (Project 3)
DATA: LOS (length of hospital stay) data analysis (Project 3)
Introduction to clinical (medical) data analysis and survival analysis.
Kaplan-Meier estimate, survival models, log-rank and weighted log-rank tests.
Weibull regression.
Cox proportional-hazards regression with applications.

Week 13 [W13] Generalized additive model (GAM)
Special topic: seeking maximal association for Y and Xs.
DATA: Taiwan air pollutant data versus health-insurance data bank (several diseases)
Week 14 [W14]  Presentation of Project 3.
Panel discussion.
Project 4 : proposals and discussion
General considerations for model-building sstrategy.
Large-P-small-N question.

Week 15 [W15] Time-dependent clustering.
Spatial temporal data analysis and detection of spatial clustering.
DATA: eBird data and avian influenza outbreaks in poultry farms
DATA: Dengue outbreaks
自主學習: homework and exercises
Week 16 [W16] Functional clustering; noncentral chi-square, noncentral t, and noncentral F distributions; regression trees, CART, random forest (revisited)
DATA: Taiwan’s PM2.5 data
自主學習: homework and exercises
DATA: Ginseng (人蔘) 1H-NMR data
DATA: metabolomic (NMR spectroscopic) data of salmon smolts with integrated ANOVAs.
Project 4: final presentation.
self-directed
learning
   03.Preparing presentations or reports related to industry and academia.

Evaluation
100% 習作及報告
Textbook & other References
1. The elements of statistical learning, 2nd ed. By T. Hastie, R. Tibshirani, and J. Friedman. (2009) Springer.
2. Computer age statistical inference. By B. Efron and T. Hastie. (2016) Cambridge.
Teaching Aids & Teacher's Website
上課講義
Office Hours
星期六 17:00~18:00
Sustainable Development Goals, SDGs(Link URL)
include experience courses:N
Please respect the intellectual property rights and use the materials legally.Please respect gender equality.
Update Date, year/month/day:2026/02/11 12:24:46 Printed Date, year/month/day:2026 / 3 / 18
The second-hand book website:http://www.myub.com.tw/