Schema Extraction 🗓 🗺
Giovedì 22 Ottobre 2015 Ore 11:00-12:30
Sala Riunioni, 1° piano
Dipartimento di Ingegneria Università Roma Tre
Via della Vasca Navale, 79, Roma
Divesh Srivastava is the head of Database Research at AT&T Labs-Research. He is an ACM fellow, on the board of trustees of the VLDB Endowment, the managing editor of the Proceedings of the VLDB Endowment (PVLDB) and an associate editor of the ACM Transactions on Database Systems (TODS). His research interests and publications span a variety of topics in data management.
Increasingly complex databases need ever more sophisticated tools to help users understand their schemas and interact with the data. This is challenging since complex databases often have thousands of tables and inadequate schemas, with little indication of the important tables or the main concepts. We address these challenges and describe techniques to extract an understandable schema from a complex database. We first present a robust algorithm to discover foreign/primary key relationships between tables, based on a general rule, termed Randomness. We then describe an information-theoretic approach that takes a set of tables linked using foreign/primary keys to identify important tables and cluster tables into the main concepts of the schema. Finally, we propose summary graphs that meet specified size constraints and preserve the most informative join paths between tables of user interest.