A b s t r a c t s
of Keynote Speakers

Adam Belloum
University of Amsterdam, The Netherlands

Defining curriculum for data science

Abstract:

The last couple of years we have seen an exponential increase in the Job Market for Data Scientists. As a result a number of trainings, courses, and university educational programs both at graduate and post-graduate levels have been labelled with the words “Big data” or “Data Science” and aim at forming people who have the right competences and skills to fill the need for data scientists in the job market. The Horizon 2020 EDISON project, a Coordination & Support Action, has the ambition to create a synergy between educational institutions and job market, which will help to establish the data scientist as a profession. This will be achieved by aligning industry needs with available career paths, and supporting academies in reviewing their curricula with respect to expected profiles, required expertise and professional certification. This talk will present an EDISON approach toward defining the Data Science body of knowledge and model curricula taking into consideration existing professional profiles.


Dieter Kranzlmueller
Ludwig-Maximilians-Universität München, Germany

Environmental (friendly) supercomputing on SuperMUC

Abstract:

The wide variety of applications on the high-performance computing system SuperMUC at the Leibniz Supercomputing Centre (LRZ) in Garching near Munich includes many different applications from the environmental domain, such as hydrometeorology and seismology. This domain of applications requires dedicated support, not only for porting, scaling and running applications, but also for building distributed infrastructures for accessing, storing and archiving sensor data. The LRZ partnership initiative piCS addresses the needs of the user communities for dedicated support, simplifying the utilization of high performance and high throughput computers for the daily work of the environmental scientist. This talk provides a motivation for dedicated support to areas such as environmental sciences, and provides success stories for using supercomputing in this domain. It demonstrates that the partnership initiative is a viable approach to providing HPC support for user communities


Valeria Krzhizhanovskaya
ITMO University, Russia and University of Amsterdam, The Netherlands

Making sense of the “Big Nonsense”: data-driven modelling and anomaly detection by machine learning methods. Examples from flood early warning systems and road traffic simulations.

Abstract:

The world is obsessed with the Big Data. Hundreds of papers, books and educational programmes are popping up every year. While some scientists provocatively call it the “Big Nonsense” leading to the “end of scientific thinking”, others are happily using a plethora of emerging methods to derive mathematical models from real-life observations and to detect anomalies that may predict a system collapse. In this talk I will describe a few examples from our recent work, where the synergy of "data science" and traditional "computational science" proved to be beneficial: Within the flood early warning systems, a combination of finite element modelling with advanced data analysis of sensor measurements successfully predicted levee instability and failure a few days before the collapse – early enough for the maintenance services to reinforce the embankment slope. In our investigations of the flow-induced vibrations of flood barrier gates, we were able to derive the second-order differential equations from time signal using the Differential Evolution method. Two novel approaches for a constrained optimization treatment of this inverse problem were proposed, enabling accurate identification of the dynamical system. In the levee and dam health monitoring by non-intrusive sensors, we successfully detected cracks, erosion and piping events in passive seismic data by machine learning methods: unsupervised clustering and support vector machines (SVM). A two-class SVM (labelled anomalies) achieved over 94% accuracy. A one-class SVM (no labelled data for anomalies) first achieved 83% accuracy, and with a new automatic feature selection procedure the result was improved to over 91% accuracy. This is a remarkable achievement for unlabelled data. A completely different example comes from the transportation systems. We used traffic data from 25,000 sensors installed along the roads in the Netherlands, and analysed the consequences of the major power outage in North Holland on road traffic congestion dynamics. Data-driven travel demand modelling and agent-based traffic simulation allowed us to develop a detailed realistic model, which reproduced the normal and critical traffic situations in Amsterdam urban area. Needless to say, all these studies are conducted in a close collaboration with the computer scientists, because we needed the most advanced computational frameworks orchestrating the workflows and distributed computing on grids and clouds. An exemplar work is done by the Department of Computer Science of the AGH University. One of their masterpieces was the UrbanFlood Common Information Space.


Rob Meijer
University of Amsterdam and TNO, The Netherlands

Distributing the systems, dynamic network architectures, and applications

Abstract:

Proof of concepts for distribution concepts of distributed systems are described, thereby illustrating their run-time interworking and adaptation to changed circumstances. Amongst others, self-optimizing globe spanning networks that use cloud data centers as routing hubs are shown. Practical matters as the performance of these systems and the use of GPU’s are touched upon. Topological patterns that are equilibrium properties of the adaptive distribution and scaling of heterogeneous workflow systems are pointed out. This is followed by a brief speculation about the information content of these patterns. Preliminary results of ongoing research of ‘Security adaptive response networks – SARNET’ are presented. Conceptually interesting is the role of virtualized networks in the organization of ‘structure’ in distributed systems. A case will be made for a practical, yet very secure Internet, that is enabled by a combination of real and virtualized ICT infrastructures. Then the software, ‘control loops’ that adapt the distribution and scaling of the systems is analyzed. It is explained that control loops capture the essence of a dynamic, evolving, architecture and, hence they should be termed Dynamic Networked Architectures. Indeed, they share some resemblance with the biological one. The keynote ends with a pure speculation. The consequence of the distribution of DNA as a software system itself is illustrated by discussing ICT that evolves faster than one can reverse engineer.


Philippe Trautmann
EMEA Sales Director HPC and POD, Hewlett Packard Enterprise

HPC update from HPE: trends, strategy and futures

Abstract:

HPC today is in a profound change of scope, objectives and delivery models. With the merge of HPC and BigData/DBA/Deep Learning, willingness to consume HPC as a Service (HPC-aaS), and also with the coming revolution on architectures supporting HPC workloads, our entire ecosystem is undergoing a severe change. This talk will be about identifying the ways HPE addresses those changes, and also provide some views on the HPE Exascale initiative, with project “The Machine” as the implementation vehicle


Hai Zhuge
Aston University, UK

Multi-dimensional summarization in cyber-physical society

Abstract:

Summarization is one of the key features of human intelligence. It plays an important role in understanding and representation. With rapid and continual expansion of texts, pictures and videos in cyberspace, automatic summarization becomes more and more desirable. Text summarization has been studied for over half century, but it is still hard to automatically generate a satisfied summary. Traditional methods process texts empirically and neglect the fundamental characteristics and principles of language use and understanding. This keynote summarizes previous text summarization approaches in a multi-dimensional classification space, introduces a multi-dimensional methodology for research and development, unveils the basic characteristics and principles of language use and understanding, investigates some fundamental mechanisms of summarization, studies the dimensions and forms of representations, and proposes a multi-dimensional evaluation mechanisms. Investigation extends to the incorporation of pictures into summary and to the summarization of videos, graphs and pictures, and then reaches a general summarization framework. This lecture is based on the following new book: H. Zhuge, Multi-Dimensional Summarization in Cyber-Physical Society, Morgan Kaufmann, 2016.

Organizers:
AGH University of Science and Technology
AGH University of Science
and Technology
ACC Cyfronet AGH
ACC Cyfronet AGH
Department of Computer Science AGH

Department of Computer
Science AGH