國立中興大學教學大綱
課程名稱 (中) 數據分析與AI應用(7627)
(Eng.) Application in Data Analysis and Artificial Intelligence
開課單位 人工智慧資科學程
課程類別 選修 學分 3 授課教師 吳宏達
選課單位 人工智慧資科學程 / 碩專班 授課使用語言 中文 英文/EMI 開課學期 1122
課程簡述 數據分析是各領域科學都會遭遇的主題,當數據分析伴以統計與機率模型時,可以有更客觀可信的結果詮釋。學習如何藉由簡單的統計(包括機率)模型對一組不論多大的數據做分析,並且建立合理的解釋模型,是現代分析者邁向AI世紀必須具備的基本知識和能力。
先修課程名稱
課程含自主學習 Y
課程與核心能力關聯配比(%) 課程目標之教學方法與評量方法
課程目標 核心能力 配比(%) 教學方法 評量方法
藉由多組資料的完整分析並上台報告,我們從最簡單的統計方法開始,逐一透過數據的分析介紹統計模型。這些模型最後都在AI應用裡具有依定程度的重要性。我們尤其側重與AI相關的統計方法、數據分析思維、以及實例演練。
習作
討論
實習
講授
出席狀況
口頭報告
作業
作品
授課內容(單元名稱與內容、習作/每週授課、考試進度-共18週)
週次 授課內容
第1週 [W1]
Introduction to data analysis and AI applications.
Several examples for simple (but useful) analysis with data.
Statistical and mathematical modeling.
Introduction to linear regression, logistic regression and unsupervised machine learning.
第2週 [W2]
DATA: Spam email data [Project 1]
DATA: breast cancer and biomarker (case control study) [Project 1]
Statistical hypothesis testing; T test and F test (ANOVA).
Sample size calculation.
Linear regression with examples.
Pearson correlation and Spearman’s correlation; Kendall’s tau.
PCA analysis and linear discriminant analysis.
第3週 [W3] Probability and distributions. Bayes theorem.
Parametric and nonparametric methods.
Rank-based Mann-Whitney and Wilcoxon procedures.
ROC curve analysis with examples.
第4週 [W4] ROC curve versus logistic regression.
Unsupervised learning: K-means method; hierarchical cluster analysis and DIANA. Minimum entropy clustering.
第5週 [W5] Presentation of Project 1.
Small topic_questionnaire data: internal reliability (Cronbach’s alpha) and inter-rater reliability (Cohen’s kappa)
第6週 [W6] DATA: Taiwan PM2.5 data [Project 2]
DATA: currency of 19 country in 6 months [Project 2]
Bootstrapping.
Supervised learning: random forest method and gradient boosting.
Functional data (regularly sampled) analysis.
第7週 自主學習。內容:
[W7] DATA: alpine insect fauna; cluster analysis after data transformation
Special topic: Response surface with 2nd-order model and experimental design
第8週 [W8] Generalized linear model (GLM): an introduction.
Longitudinal data analysis, Poisson regression.
Generalized estimating equation (GEE) model, robust inference and generalized linear mixed model (GLMM).
第9週 [W9] Presentation of Project 2.
Probability distributions and stochastic processes: a review
(including Markov chain, Brownianian motion and Brownian bridge process with applications in data analysis)
Assignment for online review: Introduction to the fundamentals of speech recognition.

第10週 [W10] DATA: Taiwan chickenpox and herpes zoster. [Project 3]
DATA: Lead-exposure workers. [Project 3]
Two by two contingency tables. Epidemiology study design; odds ratio and relative risk. Estimating common effect in multilevel/multicenter studies, conditional logistic regression and risk-set sampling.
第11週 [W11] Matching, match-pair design, McNemar procedure. Propensity score matching (PSM). Counter-matching. Stratified analysis and interaction.
第12週 [W12] DATA: Lung cancer data analysis (Project 3)
DATA: LOS (length of hospital stay) data analysis (Project 3)
Introduction to clinical (medical) data analysis and survival analysis.
Kaplan-Meier estimate, survival models, log-rank and weighted log-rank tests.
Weibull regression.
Cox proportional-hazards regression with applications.

第13週 自主學習。內容:
[W13] Generalized additive model (GAM)
Special topic: seeking maximal association for Y and Xs.
DATA: Taiwan air pollutant data versus health-insurance data bank (several diseases)
第14週 [W14]  Presentation of Project 3.
Panel discussion.
Project 4 : proposals and discussion
General considerations for model-building sstrategy.
Large-P-small-N question.

第15週 [W15] Time-dependent clustering.
Spatial temporal data analysis and detection of spatial clustering.
DATA: eBird data and avian influenza outbreaks in poultry farms
DATA: Dengue outbreaks
第16週 [W16] Functional clustering; noncentral chi-square, noncentral t, and noncentral F distributions; regression trees, CART, random forest (revisited)
DATA: Taiwan’s PM2.5 data
DATA: Ginseng (人蔘) 1H-NMR data
第17週 [W17] Bayes inference, empirical Bayes method,
DATA: metabolomic (NMR spectroscopic) data of salmon smolts with integrated ANOVAs.
第18週 [W18] Project 4: final presentation.
學習評量方式
100% 習作及報告
教科書&參考書目(書名、作者、書局、代理商、說明)
1. The elements of statistical learning, 2nd ed. By T. Hastie, R. Tibshirani, and J. Friedman. (2009) Springer.
2. Computer age statistical inference. By B. Efron and T. Hastie. (2016) Cambridge.
課程教材(教師個人網址請列在本校內之網址)
上課講義
課程輔導時間
星期六 17:00~18:00
聯合國全球永續發展目標
01.消除貧窮   04.教育品質提供體驗課程:N
請尊重智慧財產權及性別平等意識,不得非法影印他人著作。
更新日期 西元年/月/日:2024/01/04 20:43:51 列印日期 西元年/月/日:2024 / 4 / 29
MyTB教科書訂購平台:http://www.mytb.com.tw/