Introduction to data engineering issues in data science. Data management technology objectives. Relational database technology, relational algebra, SQL, transactions, data modelling methodology, entity-relationship models. NoSQL databases including key-value stores, document databases, wide-column stores, graph databases. Overview of big data processing platforms. Data integration including data warehousing, data lakes, ETL and ELT approaches. Data preparation for analysis, data quality, data cleaning. Introduction to several current topics in database research, such as data mining, managing data streams, distributed/parallel databases, HTAP architectures.
Open to Master of Data Science and Artificial Intelligence students and others without an undergraduate course on database systems (instructor approval required).
TBD
Week | Lecture | Topic | Speaker |
---|---|---|---|
1 (Jan 6) | 1 | Course introduction; Introduction to Database Systems | Tamer Özsu |
2 | Relational model of data, relational calculus & algebra | Tamer Özsu | |
2 (Jan 13) | 1 | Relational algebra, SQL | Tamer Özsu |
2 | SQL | M.T. Özsu | |
3 (Jan 20) | 1 | Data modeling | Tamer Özsu |
2 | Catch-up day or in-class discussion | ||
4 (Jan 27) | 1 | Big data: Dealing with volume | Tamer Özsu |
2 | Big data: Dealing with volume | Tamer Özsu | |
5 (Feb 3) | 1 | Big data: Dealing with variety | Tamer Özsu |
2 | Big data: Dealing with variety | Tamer Özsu | |
6 (Feb 10) | 1 | Big data: Dealing with velocity | Tamer Özsu |
2 | Big data: Dealing with velocity | Tamer Özsu | |
7 (Feb 17) | Reading week - no classes | ||
8 (Feb 24) | 1 | Midterm exam | |
2 | Catch-up day or in-class discussion | ||
9 (Mar 3) | 1 | Data integration: Data warehouses | Tamer Özsu |
2 | Data integration: Data lakes | Renée Miller | |
10 (Mar 10) | 1 | Cloud computing & cloud-native data management | |
2 | OLAP & OLTP: HTAP systems | Anil Goel | |
11 (Mar 17) | 1 | Data preparation: the pipeline | |
2 | Data quality & data cleaning | ||
12 (Mar 24) | 1 | Data quality & data cleaning | |
2 | Data provenance | ||
13 (Mar 31) | 1 | Vector databases | Jianguo Wang |
2 | Data management issues in LLMs | Theo Rekatsinas | |