Polyglot Persistence for Big Data Management
Teacher
Roberto De Virgilio
Abstract
The term polyglot is borrowed and redefined for big data as a set of applications that use several core database technologies, and this is the most likely outcome of your implementation planning. The official definition of polyglot is “someone who speaks or writes several languages.” It is going to be difficult to choose one persistence style no matter how narrow your approach to big data might be.
This course aims at describing how to design a polyglot persistence database used when it is necessary to solve a complex big data problem by breaking that problem into segments and applying different database models. It is then necessary to aggregate the results into a hybrid data storage and analysis solution.
With an application that uses many types of data, a web service can be created to send the data request to the appropriate database: this will come at a cost in complexity, as each data storage solution means learning a new technology. But the benefits will be worth it, as when relational databases are using inappropriately, they will cause a significant slowdown in application development and performance. Another benefit is many NoSQL database are designed to operate over clusters and can handle large volumes of data, so it gives you horizontal scaling (scale-out) as opposed to the limitation with most relational databases that use vertical scaling (scale-up).
Program
Students will produce their own Big Polyglot Data System: Datasets and open source platforms will be provided to discuss popular real case studies and to select the most suitable storage and application solutions.
1. Basics of Big data Infrastructure
2. Identify the Data you need for Your Big Data
3. Operational Databases for Big Polyglot data Management (i.e. NoSQL and NewSQL solutions)
4. Characteristics of a Big Polyglot Data Analysis Framework
5. Case studies with ad-hoc software libraries (e.g. Presto, BigDawG and so on)
Primary Material
Primary source material will be readings in the form of multimedia presentations, research papers, official web sites and other material provided by the instructor.
Software resources
This course contains design and programming assignments which will be in Java using open source frameworks.