Methods and techniques for the management of Big Data
Teacher
Riccardo Torlone
Abstract
The course aims at illustrating tools and methods for the management of big data, i.e. massive amounts of unstructured data whose size exceed the capacity of conventional database management systems to capture, store, manage and analyse data.
The lectures focus on: (i) the problems of storing and processing big data, (ii) the hardware and software solutions that have been proposed in recent years to solve these problems, and (iii) open issues that still need to be solved satisfactory in this context. The course includes practical exercises with real systems and the assignments of individual projects.
All the subjects addressed during the course are investigated under both practical and methodological perspectives. Possible directions of research in the field are also suggested.
Program
1. Brief introduction to the Big Data phenomenon;
2. Hadoop & Map-reduce;
3. Big data tools such as Spark, Hive, Giraph, Storm, MLlib, and Open R;
4. NoSQL database systems;
5. Data analytics.
Primary Material
Primary source material will be readings in the form of research papers and material provided by the instructor.
Software resources
This course contains programming assignments which will be in Java, Python or Scala using open source frameworks.