國立中興大學教學大綱
課程名稱 (中) 資訊檢索導論(4116)
(Eng.) Introduction to Information Retrieval
開課單位 資工系
課程類別 選修 學分 3 授課教師 范耀中
選課單位 資工系 / 學士班 授課使用語言 中文 英文/EMI 開課學期 1122
課程簡述 As the amount of text data, such as web pages and blogs, grows explosively, it is increasingly important to develop tools to help us manage the huge amount of information. Web search engines are good examples of such tools. In this course, you will learn the underlying technologies behind the Web search engines and Information Retrieval. This course will cover traditional material as well as recent advances in Information Retrieval. You will be able to learn the basic principles and algorithms for managing, indexing, query, and classifying text data. In addition, this course will introduce MapReduce, which is an important tool for managing big text data.
Students taken this course will work in teams of two to complete a course project using Hadoop, HBase, Pig, and Hive tools to build a personal web search engine. The grading of the course project will be based on the clarification of the project presentation, the functionality of the built system, and challenges of implementing the system.
在這門課中,我們將介紹搜尋引擎的最基本原理:資訊檢索(Information Retrieval)。我們將以Google Search Engine為例介紹資訊檢索之背景知識包含資料索引、鏈結分析、查詢擴增等等技術。此外隨著網際網路的發展,大量的資料快速地累積,平行處理已為資訊檢索之基本元素、在課程中我們也將帶入平行處理之初步概念介紹MapReduce程式平行架構(MapReduce為Google處理大量資料之工具),並透過作業實作的方式,利用現有簡易工具,快速建構個人搜尋引擎,並對目前巨量資料處理(Big Data)技術有所一概窺。


先修課程名稱
課程含自主學習 Y
課程與核心能力關聯配比(%) 課程目標之教學方法與評量方法
課程目標 核心能力 配比(%) 教學方法 評量方法
To learn the related knowledge of information retrieval.
To gain the ability of writing the parallel program by using Hadoop.
Course Overview
1. Boolean retrieval
2. Index Construction and Compression
3. Scoring, The vector space model (1/2)
4. Scoring, The vector space model (2/2)
5. Evaluation in information retrieval (1/2)
6. Evaluation in information retrieval (2/2)
7. Midterm
8. Web Link Analysis









授課內容(單元名稱與內容、習作/每週授課、考試進度-共18週)
週次 授課內容
第1週 - Introduction: Goals and history of IR. The impact of the web on IR.
第2週 - Basic IR Models: Boolean and vector-space retrieval models; ranked retrieval; text-similarity metrics; TF-IDF (term frequency/inverse document frequency) weighting; cosine similarity.
第3週 - Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval: Simple tokenizing, stop-word removal, and stemming; inverted indices; efficient processing with sparse vectors; Java implementation.
第4週 - Experimental Evaluation of IR: Performance metrics: recall, precision, and F-measure; Evaluations on benchmark text collections.
第5週 - Query Operations and Languages: Relevance feedback; Query expansion; Query languages.
第6週 - Text Representation: Word statistics; Zipf’s law; Porter stemmer; morphology; index term selection; using thesauri. Metadata and markup languages (SGML, HTML, XML).
第7週 - Text Categorization: Categorization algorithms: Rocchio, nearest neighbor, and naive Bayes. Applications to information filtering and organization.
第8週 Link Analysis
第9週 HITS algorithm
第10週 Midterm Exam
第11週 - Text Clustering: Clustering algorithms: agglomerative clustering; k-means; expectation
maximization (EM). Applications to web search and information organization.
第12週 Hadoop/Spark Introduction (1/4): 1002 雲端教室上機
第13週 Hadoop/Spark Programming(2/4) : 1002 雲端教室上機
第14週 Hadoop/Spark Algorithm (3/4) : 1002 雲端教室上機
第15週 Hadoop/Spark Algorithm (4/4) : 1002 雲端教室上機
第16週 自主學習 (Open Domain Question Answering Problem)
第17週 自主學習 (Open Domain Question Answering Problem)
第18週 Final Exam + Final Exam for your QA-AI
學習評量方式
1. Programming Assignment 20%
2. Midterm 40%
3. Final Exam: 40%








教科書&參考書目(書名、作者、書局、代理商、說明)
1. Manning, Introduction to Information Retrieval.
2. Tom White, Hadoop: the Definitive Guide, Third Edition, OREILLY.








課程教材(教師個人網址請列在本校內之網址)
Handouts and slides will be given in the class.








課程輔導時間
By appointment








聯合國全球永續發展目標(連結網址)
09.工業、創新基礎建設提供體驗課程:N
請尊重智慧財產權及性別平等意識,不得非法影印他人著作。
更新日期 西元年/月/日:2024/01/31 14:51:03 列印日期 西元年/月/日:2025 / 8 / 17
MyTB教科書訂購平台:http://www.mytb.com.tw/