Tutorials

 

Multi-Agent Reinforcement Learning

Slides

Daan Bloembergen
Maastricht University
daan.bloembergen@maastrichtuniversity.nl
and
Daniel Hennes
Maastricht University
daniel.hennes@maastrichtuniversity.nl
and
Michael Kaisers
Technical University Berlin
michael.kaisers@dai-labor.de
and
Karl Tuyls
Maastricht University
k.tuyls@maastrichtuniversity.nl
and
Peter Vrancx
Vrije Universiteit Brussel
pvrancx@vub.ac.be

Multi-agent reinforcement learning (MARL) is an important and fundamental topic within agent-based research. After giving successful tutorials on this topic at EASSS 2004 (the European Agent Systems Summer School), ECML 2005, ICML 2006, EWRL 2008 and AAMAS 2009-2012, with different collaborators, we now propose a revised and updated tutorial, covering both theoretical as well as practical aspects of MARL.

Participants will be taught the basics of single-agent reinforcement learning and the associated theoretical convergence guarantees related to Markov Decision Processes. We will then outline why these convergence guarantees no longer hold in a setting where multiple agents learn. We will explain practical approaches on how to scale single agent reinforcement learning to these situations where multiple agents influence each other and introduce a framework, based on game theory and evolutionary game theory, that allows thorough analysis of the dynamics of multi-agent learning. Finally, we will discuss multi-agent learning in continuous state and action spaces.
 

Second order learning

Slides

Koby Crammer
The Technion
koby@ee.technion.ac.il

Second order algorithms are included both in optimization (e.g. Newton method) and online learning, where they are motivated both geometrically (i.e. second order perceptron) and from statistical properties of natural language (i.e. confidence weighted learning). These methods converge fast and achieve state-of-the-art results in many natural language processing tasks. Indeed, some of these methods were in fact motivated from natural language processing (NLP), where feature-occurrence statistics follows an heavy tail distribution, yet first-order algorithms design nor their analysis are directly related to these NLP properties. For example, in some sentiment classification tasks, some predictive features are very common, yet most of them are relatively rare, indicating that modeling even infrequent features may be useful for learning.

In this tutorial, we will focus on old and recent techniques for learning that exploit such statistics and their analysis. We will introduce systematically second order learning, covering both basic principles and specific algorithms and tools, such as second order perceptron, adaptive gradient methods, and confidence-weighted learning, as well as usages in various tasks and settings: active learning, domain adaptation, confidence estimation and, bandit prediction and selective sampling.
 

Algorithmic techniques for modeling and mining large graphs

Slides

Alan Frieze
Carnegie Mellon University
alan@random.math.cmu.edu
and
Aristides Gionis
Aalto University
aristides.gionis@aalto.fi
and
Charalampos Tsourakakis
Carnegie Mellon University
tsourolampis@gmail.com

Network science has emerged over the last years as an interdisciplinary area spanning traditional domains including mathematics, computer science, sociology, biology and economics. Since complexity in social, biological and economical systems, and more generally in complex systems, arises through pairwise inter- actions there exists a surging interest in understanding networks.

In this tutorial, we will provide an in-depth presentation of the most popular random-graph models used for modeling real-world networks. We will then discuss efficient algorithmic techniques for mining large graphs, with emphasis on the problems of extracting graph sparsifiers, partitioning graphs into densely connected components, and finding dense subgraphs. We will motivate the problems we will discuss and the algorithms we will present with real-world applications.

Our aim is to survey important results in the areas of modeling and mining large graphs, to uncover the intuition behind the key ideas, and to present future research directions.
 

Web Scale Information Extraction

Slides

Ziqi Zhang
University of Sheffield
z.zhang@dcs.shef.ac.uk
and
Anna Lisa Gentile
University of Sheffield
a.l.gentile@dcs.shef.ac.uk

This tutorial analyses open challenges for Web-scale Information Extraction (IE) and introduces the usage of Linked Data as a ground-breaking solution for the field. Training data is an essential resource to machine learning. However, it is expensive to create. The limited availability of such data has so far prevented the study of the generalised use of large-scale resources to port to specific user information needs on Information Extraction tasks.

For the last few years Linked Data has grown to a gigantic knowledge base, which, as of 2013, comprised 31 billion triples in 295 data sets (http://lod-cloud.net/state/). Such resources can become invaluable training data for Web-scale Information Extraction and natural language tasks because they are: (i) very large scale, (ii) constantly growing, (iii) covering multiple domains and (iv) being used to annotate a growing number of pages that can be exploited for training.

This tutorial will show how to exploit Linked Data for IE and will explore Information Extraction techniques able to scale at web level and adapt to user information need. We will particularly focus on the tasks of Wrapper Induction and Table Interpretation. As an example of linked data driven IE, we will present and discuss a multi-strategy learning method and framework designed to train Web-scale IE using Linked Data, while coping with noise in the training data. The approach uses multiple strategies: (i) it wraps very regular web sites generated by backing databases; (ii) extracts from regular structures such as tables and lists and (iii) learns lexical-syntactic extraction patterns for information extraction from natural language.
 

Mining and learning with network-structured data

Slides

Jan Ramon
KULeuven
Jan.Ramon@cs.kuleuven.be

In recent years, several relevant relations between the mining of large data networks and different branches of computer science and mathematics have been revealed, e.g. graph theory, statistical physics, complex systems, algorithmics and spectral graph theory. Exploiting these relations and combining aspects of several of these fields is an exciting direction of research often leading to surprising results. This tutorial aims at giving an introduction to and an overview of the several fields related to mining networks and their basic principles, and at providing examples of results linking these fields to data mining tasks. The overall goal is give a basic introduction and point to interesting links, triggering discussions and new ideas for future research.

The tutorial will consist of three parts. The first part will present an overview of several related fields, their essential concepts and questions, and the relations between them. These include algorithmics, graph theory, pattern matching, pattern mining, statistical physics, complex systems, link prediction and statistical relational learning and spectral graph theory. A second part will consider patterns in data networks and will consist of a discussion on the emergence of (local) patterns, 0-1 laws, (graph) algorithmic and complexity theoretic aspects of pattern matching and pattern mining, sampling approaches, criteria for measuring statistical significance or other qualities of patterns, relations to random graph models. A final part will concern learning and will survey machine learning settings in graphs, basic algorithmic ideas, relations to the pattern mining approaches of the previous part, and relations to statistical physics based generative graph models
 

Performance Evaluation of Machine Learning Algorithms

Slides

Mohak Shah
GE Global Research
mohak@mohakshah.com
and
Nathalie Japkowicz
University of Ottawa
nat@site.uottawa.ca

Machine learning and Data mining are increasingly being applied to a wide range of domains and are quickly reaching maturity to a point that various instances of technologies are commoditized. Due to their inherent interdisciplinary nature the fields draw researchers from varying backgrounds. Not only that, data scientists and other experts practicing these techniques on a regular basis are more broad in terms of their backgrounds. It is of critical importance in such a case that these researchers and practitioners are aware of both the proper methodologies and the respective issues that arise in terms of evaluating novel learning approaches. This tutorial aims at educating as well as getting the broad machine-learning and data science community to discuss these critical issues in performance evaluation. The tutorial will cover various aspects of the evaluation process with a focus on classification algorithms. We will discuss various choice decision vis-a-vis the issues, assumptions and constraints involved at various steps of this process. It will also present R and WEKA tools that can be utilized to apply them. The tutorial will span four areas of classifier evaluation:

  • Performance Measures (evaluation metrics and graphical methods)
  • Error Estimation/Re-sampling Techniques
  • Statistical Significance Testing
  • Issues in data Set Selection and evaluation benchmarks design

The tutorial will, in part, be based on the book by the presenters on the subject: Japkowicz, Nathalie and Shah, Mohak, Evaluating Learning Algorithms: A Classification Perspective, Cambridge University Press (2011).

The purpose of the tutorial is to promote an appreciation of the need for rigorous and objective evaluation and an understanding of the available alternatives along with their assumptions, constraints and context of application. Machine learning researchers and practitioners alike will all benefit from the contents of the tutorial, which discusses the need for sound evaluation strategies, practical approaches and tools for evaluation, going well beyond those described in existing machine learning and data mining textbooks, so far.
 

Discovering Roles and Anomalies in Graphs: Theory and Applications

Slides

Tina Eliassi-Rad*
Rutgers University
tina@eliassi.org
and
Christos Faloutsos
Carnegie Mellon University
christos@cs.cmu.edu

*Professor Eliassi-Rad will be presenting the tutorial.

Given a graph, how can we find suspicious nodes? How can we automatically discover roles for nodes? Roles are compact summaries of a node’s behavior that generalize across networks. For example, one role could be ‘star’ – with a star node being both influential and having a low neighborhood overlap. Are there good features that we can extract for nodes that indicate role- membership? What are the different applications in which these discovered roles can be effectively used? The objective of this tutorial is to provide a concise and intuitive overview of the most important concepts and tools, which can detect roles (or functions) for nodes in both static and dynamic graphs. We review the state of the art in three related fields: (a) community discovery, (b) equivalences (from sociology), and (c) propositionalisation (from multi-relational data mining). The emphasis of this tutorial is to give the intuition behind these powerful mathematical concepts and tools, which are usually lost in the various technical literatures, as well as to give case studies that illustrate their practical use.
 

Statistically sound pattern discovery

Web

Wilhelmiina Hämäläinen
University of Eastern Finland
whamalai@cs.uef.fi
and
Geoff Webb
Monash University
geoff.webb@monash.edu

Pattern discovery is a core data mining activity. Initial approaches were dominated by the frequent pattern discovery paradigm – only frequent patterns were explored. Having been thoroughly explored and its limitations now well understood, this paradigm is giving way to two emerging alternatives – the information theoretic minimum message length paradigm and statistically sound paradigm. This tutorial presents the foundations of the latter. In this paradigm, patterns are required to pass statistical tests with respect to user defined null-hypotheses, providing great flexibility about the properties that are sought, and strict control over the risk of false discoveries and overfitting.

Comments are closed.