Introduction to data engineering issues in data science. Data management technology objectives. Structured data management: Relational database technology, database workloads (OLTP vs OLAP). Big data issues: dealing with volume (geo-distributed, cluster parallel, and cloud-native data management), dealing with variety (data type-native systems, NoSQL database systems), dealing with velocity (streaming data management), and big data processing platforms (MapReduce, Spark). Data preparation pipeline: data acquisition, data integration (data warehouses, data lakes, lake houses), dataset selection, data quality and cleaning, data provenance management. Introduction to several current topics in database research, such as Large Language Models, vector databases.
Open to Master of Data Science and Artificial Intelligence students and others without an undergraduate course on database systems (instructor approval required).
TBD
Week | Lecture | Topic | Speaker |
---|---|---|---|
1 (Jan 5) | 1 | Course introduction; Structured Data Management: Introduction to Database Systems | Tamer Özsu |
2 | Relational model of data, relational calculus & algebra | Tamer Özsu | |
2 (Jan 12) | 1 | Relational algebra, SQL | Tamer Özsu |
2 | Database Workloads (OLTP & OLAP: HTAP systems) | ||
3 (Jan 19) | 1 | Big data: Dealing with volume | Tamer Özsu |
2 | Big data: Dealing with volume | Tamer Özsu | |
4 (Jan 26) | 1 | Big data: Dealing with variety | Tamer Özsu |
2 | Big data: Dealing with variety | Tamer Özsu | |
5 (Feb 2) | 1 | Big data: Dealing with velocity | Tamer Özsu |
2 | Big data: Dealing with velocity | Tamer Özsu | |
6 (Feb 9) | 1 | Cloud computing & cloud-native data management | |
2 | Introduction to data preparation pipeline | Tamer Özsu | |
7 (Feb 16) | Reading week - no classes | ||
8 (Feb 23) | 1 | Midterm exam | |
2 | Data acquisition | Tamer Özsu | |
9 (Mar 2) | 1 | Data integration: Data warehouses | Tamer Özsu |
2 | Data integration: Data lakes | ||
10 (Mar 9) | 1 | Data integration: Data lakehouses | |
2 | Data profiling | ||
11 (Mar 16) | 1 | Data quality & data cleaning | |
2 | Data quality & data cleaning | ||
12 (Mar 23) | 1 | Data provenance | |
2 | LLMs and Data Management | ||
13 (Mar 30) | 1 | LLMs and Data Management | |
2 | Vector databases | ||