University of Amsterdam, The Netherlands
Defining curriculum for data science
The last couple of years we have seen an exponential increase in the Job Market for Data Scientists. As a result
a number of trainings, courses, and university educational programs both at graduate and post-graduate levels have been
labelled with the words “Big data” or “Data Science” and aim at forming people who have the right competences and skills to fill the
need for data scientists in the job market. The Horizon 2020 EDISON project, a Coordination & Support Action, has the
ambition to create a synergy between educational institutions and job market, which will help to establish the data scientist
as a profession. This will be achieved by aligning industry needs with available career paths, and supporting academies in
reviewing their curricula with respect to expected profiles, required expertise and professional certification. This talk will present
an EDISON approach toward defining the Data Science body of knowledge and model curricula taking into consideration
existing professional profiles.
Ludwig-Maximilians-Universität München, Germany
Environmental (friendly) supercomputing on SuperMUC
The wide variety of applications on the high-performance computing
system SuperMUC at the Leibniz Supercomputing Centre (LRZ) in Garching
near Munich includes many different applications from the
environmental domain, such as hydrometeorology and seismology. This
domain of applications requires dedicated support, not only for
porting, scaling and running applications, but also for building
distributed infrastructures for accessing, storing and archiving
sensor data. The LRZ partnership initiative piCS addresses the needs
of the user communities for dedicated support, simplifying the
utilization of high performance and high throughput computers for the
daily work of the environmental scientist. This talk provides a
motivation for dedicated support to areas such as environmental
sciences, and provides success stories for using supercomputing in
this domain. It demonstrates that the partnership initiative is a
viable approach to providing HPC support for user communities
ITMO University, Russia and University of Amsterdam, The Netherlands
Making sense of the “Big Nonsense”: data-driven modelling and anomaly detection by machine learning methods. Examples from flood early warning systems and road traffic simulations.
The world is obsessed with the Big Data. Hundreds of papers, books and educational programmes are popping up every year. While some scientists provocatively call it the “Big Nonsense” leading to the “end of scientific thinking”, others are happily using a plethora of emerging methods to derive mathematical models from real-life observations and to detect anomalies that may predict a system collapse.
In this talk I will describe a few examples from our recent work, where the synergy of "data science" and traditional "computational science" proved to be beneficial:
Within the flood early warning systems, a combination of finite element modelling with advanced data analysis of sensor measurements successfully predicted levee instability and failure a few days before the collapse – early enough for the maintenance services to reinforce the embankment slope.
In our investigations of the flow-induced vibrations of flood barrier gates, we were able to derive the second-order differential equations from time signal using the Differential Evolution method. Two novel approaches for a constrained optimization treatment of this inverse problem were proposed, enabling accurate identification of the dynamical system.
In the levee and dam health monitoring by non-intrusive sensors, we successfully detected cracks, erosion and piping events in passive seismic data by machine learning methods: unsupervised clustering and support vector machines (SVM). A two-class SVM (labelled anomalies) achieved over 94% accuracy. A one-class SVM (no labelled data for anomalies) first achieved 83% accuracy, and with a new automatic feature selection procedure the result was improved to over 91% accuracy. This is a remarkable achievement for unlabelled data.
A completely different example comes from the transportation systems. We used traffic data from 25,000 sensors installed along the roads in the Netherlands, and analysed the consequences of the major power outage in North Holland on road traffic congestion dynamics. Data-driven travel demand modelling and agent-based traffic simulation allowed us to develop a detailed realistic model, which reproduced the normal and critical traffic situations in Amsterdam urban area.
Needless to say, all these studies are conducted in a close collaboration with the computer scientists, because we needed the most advanced computational frameworks orchestrating the workflows and distributed computing on grids and clouds. An exemplar work is done by the Department of Computer Science of the AGH University. One of their masterpieces was the UrbanFlood Common Information Space.
University of Amsterdam and TNO, The Netherlands
Distributing the systems, dynamic network architectures, and applications
Proof of concepts for distribution concepts of distributed systems are described, thereby illustrating their run-time interworking and adaptation to changed circumstances. Amongst others, self-optimizing globe spanning networks that use cloud data centers as routing hubs are shown. Practical matters as the performance of these systems and the use of GPU’s are touched upon. Topological patterns that are equilibrium properties of the adaptive distribution and scaling of heterogeneous workflow systems are pointed out. This is followed by a brief speculation about the information content of these patterns. Preliminary results of ongoing research of ‘Security adaptive response networks – SARNET’ are presented. Conceptually interesting is the role of virtualized networks in the organization of ‘structure’ in distributed systems. A case will be made for a practical, yet very secure Internet, that is enabled by a combination of real and virtualized ICT infrastructures. Then the software, ‘control loops’ that adapt the distribution and scaling of the systems is analyzed. It is explained that control loops capture the essence of a dynamic, evolving, architecture and, hence they should be termed Dynamic Networked Architectures. Indeed, they share some resemblance with the biological one. The keynote ends with a pure speculation. The consequence of the distribution of DNA as a software system itself is illustrated by discussing ICT that evolves faster than one can reverse engineer.
EMEA Sales Director HPC and POD, Hewlett Packard Enterprise
HPC update from HPE: trends, strategy and futures
HPC today is in a profound change of scope, objectives and delivery models. With the merge of HPC and BigData/DBA/Deep Learning, willingness to consume HPC as a Service (HPC-aaS), and also with the coming revolution on architectures supporting HPC workloads, our entire ecosystem is undergoing a severe change. This talk will be about identifying the ways HPE addresses those changes, and also provide some views on the HPE Exascale initiative, with project “The Machine” as the implementation vehicle
Aston University, UK
Multi-dimensional summarization in cyber-physical society
Summarization is one of the key features of human intelligence. It plays an important role in understanding and representation. With rapid and continual expansion of texts, pictures and videos in cyberspace, automatic summarization becomes more and more desirable. Text summarization has been studied for over half century, but it is still hard to automatically generate a satisfied summary. Traditional methods process texts empirically and neglect the fundamental characteristics and principles of language use and understanding. This keynote summarizes previous text summarization approaches in a multi-dimensional classification space, introduces a multi-dimensional methodology for research and development, unveils the basic characteristics and principles of language use and understanding, investigates some fundamental mechanisms of summarization, studies the dimensions and forms of representations, and proposes a multi-dimensional evaluation mechanisms. Investigation extends to the incorporation of pictures into summary and to the summarization of videos, graphs and pictures, and then reaches a general summarization framework. This lecture is based on the following new book: H. Zhuge, Multi-Dimensional Summarization in Cyber-Physical Society, Morgan Kaufmann, 2016.