Introduction to data engineering issues in data science. Data management technology objectives. Relational database technology, relational algebra, SQL, transactions, data modelling methodology, entity-relationship models. NoSQL databases including key-value stores, document databases, wide-column stores, graph databases. Overview of big data processing platforms. Data integration including data warehousing, data lakes, ETL and ELT approaches. Data preparation for analysis, data quality, data cleaning. Introduction to several current topics in database research, such as data mining, managing data streams, distributed/parallel databases, HTAP architectures.
Open to Master of Data Science and Artificial Intelligence students and others without an undergraduate course on database systems (instructor approval required).
TBD
Week | Lecture | Topic | Slides |
---|---|---|---|
1 | 1 | Course introduction and scoping | |
2 | Relational model of data | ||
2 | 1 | SQL | |
2 | Advanced SQL, Relational algebra & relational calculus | ||
3 | 1 | Data modeling | |
2 | Relational DBMS internals (query processing) | ||
4 | 1 | Relational DBMS internals (transaction processing) | |
2 | Big data and NoSQL | ||
5 | 1 | Big data and text processing | |
2 | Big data and data streams | ||
6 | 1 | Big data and graph processing | |
2 | Big data and scaling: Classical relational distributed DBMS | ||
7 | Reading week - no classes | ||
8 | 1 | Big data processing platforms: MapReduce | |
2 | Big data processing platforms: MapReduce/Spark | ||
9 | 1 | Cloud computing & cloud-native data management | |
2 | Privacy in big data | ||
10 | 1 | Data integration: Data warehouses | |
2 | Data integration: Data lakes | ||
11 | 1 | OLAP & OLTP: HTAP systems | |
2 | Data mining | ||
12 | 1 | Data preparation: the pipeline | |
2 | Data quality & data cleaning | ||
13 | 1 | Data quality & data cleaning | |
2 | Data provenance |