CSDAI

Department of Institut Polytechnique de Paris

Foundations of Computer Science

Shared Seminars:

 

Foundations of Computer Science

 

Presentation

    Today’s society is a digitalized society. From smart grids to nuclear plants, commercial airplanes to the long-awaited autonomous cars, the web and communication networks, imaging, data analysis and AI applications in all industries (finances, health, security etc.), there is no single aspect of our everyday life that is not relying on computer science advances. The fundamentals of computer science and information theory are what gathers the computer science, data science and artificial intelligence community at IP-Paris, whatever intended applications are: computer scientists use and produce theories and models that should better explain, improve performances, security and safety, and that should allow for better treatments of data, at scale, with privacy certificates etc. Indeed, computer science is a very peculiar science in which “theory” and “practice” are intertwined, with rapid paths to and from theory.  Our goal is to shed light on these fundamentals. This includes “core” fundamental computer science activities on “models of computation”, i.e. logics, semantics, algorithmics, information theory, machine models  (sequential, analog, biological, quantum, concurrent, distributed and communication networks, data-driven etc.), formal languages, programming languages, concurrency and distributed systems, formal methods, etc. These core fundamentals underpin all research themes of the Computer Science, Data Science and Artificial Intelligence Department of IP-Paris. The teams of IP-Paris have a quite unique position in France, with a large community (about 230 permanent faculty, 650 total), covering a wide spectrum of subjects:  Computer Science in Biotechnologies and Health; Digital Trust; Next Generation Digital Infrastructure; Robotics, Visual and audio Computing, Interaction; Data Science and Artificial Intelligence. Our objective to strengthen collaborations on fundamental research in all these themes, to both irrigate these themes with new, high risk high gain solutions, and the core fundamentals with new motivating scientific challenges. This is therefore an integrative project by essence. The dual objective of this project: “core” aspects and the “transverse” one are described below. 

Core: “Models, Computations and Languages”:

Theory of computation has long preceded its actual implementation without even mentioning its practical use. From 19th century major advances in logics, to early 20th century considerations on algebraic computations (Thue systems, and Dehn’s algorithm, ancestors of modern rewriting systems) and first “theoretical machines” (Turing machines, Post machines, lambda-calculus etc.) models of computation have predated actual computers, a phenomenon we are still contemplating with the development of non von Neuman types of machines, in quantum, biological, concurrent, distributed and even data-driven computing and communication networks.

Transverse:

Fundamental research from applications include the following: 

Computer Science in Biotechnologies and Health:

Up to two decades ago, Bioinformatics mainly developed with algorithmic tools to reason on the sequences and structure branches of molecular cell biology, ranging from DNA word problems, phylogenetic tree reconstruction, spatial structure abstractions of RNA and proteins, to prediction and optimisation of molecular interactions. Since then, Bioinformatics is also developing on the great challenge of Systems Biology to gain system-level understanding of molecular interaction networks responsible for the high-level functions of the cell, with the development of boolean abstractions and model-checkers for molecular reaction networks, a theory of analog computability and complexity, the verification and synthesis of cyber-physical systems, up to the view of chemical reaction networks as a programming language for Synthetic Biology. This effort tightly coupled with research on data analytics, multimodal data fusion, stability and robustness of classifiers, geometric representations, and model-based image analysis, augmented reality and medical robotics, is currently participating altogether in a scientific revolution in Health, Medicine and the Pharma industry.

Digital Trust:

Digital trust is about dependability, i.e. the conjunction of reliability, safety, availability, maintainability and  (cyber)-security properties. New hardware (based on complex multi-core architectures) forces us to study novel fundamental modeling techniques, testing algorithms and formal methods to cope with the more involved  non-deterministic behaviour of hardware and corresponding algorithms, and for allowing the practical use of this new computation power in critical applications, such as embedded control systems and CPS.  The fundamental semantic considerations we develop on numerical and control systems in particular lead to deriving new efficient reachability techniques, as well as set-based and abstract interpretation, algebraic and geometric (based on “qualitative dynamics” and algebraic topological invariants) methods for deriving invariants of discrete, continuous or hybrid systems, in view of their verification. In CPS, more and more controllers are data-driven, i.e. implemented using neural networks, instead of being (only) model-based (with more classical PIDs, or flat, or optimal controllers of some sort). New fundamentals on functional approximation theory and methods are among the current research themes we are exploring in this domain. Ultimately, new research is being developed for modeling and verifying software and systems in their environment, physical as for CPS but also taking into account the distributed aspects and the properties of the underlying communication networks (such as the autonomous and connected cars, swarms of drones…). (Cyber)-Security is based on the enforcement of properties by running systems, classically derived from the Confidentiality-Integrity-Availability triad. On confidentiality and integrity properties, our work stems from fundamental work on cryptographic encodings, based on e.g. algebraic curves. Still, many of the processes and methods used to define and enforce properties are relatively ad-hoc and defined in a “negative” manner. By ad-hoc, we mean that the methodological process depends strongly on humans for understanding the risks and ensuring that the proper controls are in place, from design to operations. By  negative, we mean that these processes are often in place to ensure that undesired events do not occur, but they do little to ensure comprehensive coverage and quantify the benefits of these cybersecurity mechanisms. There is a need to leverage the vast body of knowledge about software and operations vulnerabilities, to propose methods for software development and service operations that ensure immunity from errors. In addition, the increasing influence of digital systems in critical infrastructures brings together security and safety, requiring new theoretical models to solve both at the same time and in a compatible manner. There is both a scientific interest in linking formal methods (as used for validating systems and software) with cybersecurity and a practical interest: some of the classical cyber-attacks take advantage of programming defects (e.g. potential dereference out of array bounds) or CPS controller defects (e.g. stuxnet for attacking plants). Linked to this is the research needed for verifying network protocols, prone to cyber-attacks or bad behaviours. A connection to data science fundamentals comes through fundamental methods ensuring privacy, such as differential privacy, when exchanging data over a network, even in a secure (unbreachable) way. 

Next Generation Digital Infrastructure:

Our digital infrastructures are large and geo-replicated, and they spawn many computing units ranging from data-centers to user devices and sensors. At IP Paris, we have the goal of making these systems more performant, more energy efficient, more autonomous, safer and more resilient to faults. We study both the theoretical and practical aspects of these systems in a virtuous loop that feeds each other. We use and advance, for example, research in combinatorial topology to model and prove distributed systems, in operational research to model the energy consumption of a digital infrastructure or in data analysis to predict the best data and task placement on a supercomputer. We then design heuristics and build systems to experimentally evaluate the theoretical results, which in turn feeds the research in fundamental computer science by proposing more realistic models of our digital infrastructure or new theoretical problems. A typical example of theory developed at IP Paris is the Heard-Of model, a model of calculus that captures both the degree of synchrony and the notion of faulty components in an unified model. This model is today widely used as a building block to prove the correctness of many distributed algorithms implemented in real systems. Network engineering is also intensively making the best of various mathematical theories ranging from probability to game theory, topology, random geometric simplicial complex, Malliavin calculus, all are fundamental subjects that we want to put forward in this project. Planning, dimensioning, configuring, or routing are reaching such a level of complexity that they cannot scale up and meet their requirements without these tools any longer. This challenges basic research and steers progress in many of its aspects. Finally, communication networks testing (protocols, services, etc.) is a large application area, starting from the 90-s of the last century. Novel results in Operations Research bring original efficient solutions in networks analysis, including resource allocation, optimization, scheduling, etc. At the same time, computational complexity in graph theory and combinatorial optimization provides a “bank” of decision and derivation problems whose hardness has been already proven that allows to evaluate proposed solutions and related algorithms. For example, the Virtual Network Embedding problem was proven to be NP-hard through a reduction from the Maximum Stable Set Problem. 

Robotics, Visual and audio Computing, Interaction:

At IP-Paris, we obtained some fundamental results about motion planning, and its intrinsic complexity as a function of the geometry of the underlying configuration space. This has led us to develop a new topological complexity measure applicable to motion planning, and some practical application in the simplest case of non-positively curved configuration spaces. The theoretical tools for verification of distributed systems, cyber-physical systems or hybrid systems also find direct applications in certifying the safety of robots, intelligent vehicles or multi-robot systems.They are also used in the identification or training of these systems to find guaranteed models or improve training efficiency. In return such applications open opportunities to develop new theory, for example on the verification of the perception system in robots. For example, we demonstrated the fundamental limitation that learning disentangled representation in robotics cannot be done using only passive observations, but requires active data acquisition and taking the actions into account so that the representation mirrors invariants from the physical world. This opens the way to new representation learning approaches more adapted to autonomous systems. Articulation between modelling and learning is indeed a unifying point of this theme. A fundamental question we work on is how to introduce knowledge in machine learning for image, video, audio, virtual and multimedia content. This knowledge can be physical knowledge on the data acquisition system, semantic knowledge, geometric and mechanical constraints,… and various mathematical frameworks can be exploited to model it (Riemannian geometry, stochastic modeling, mathematical morphology, symbolic approaches…). Similarly, many works are led to generate content. Our current research work is to generate models of audio signals, images and videos with constraints on the latent space of neural networks (e.g. through simulation of physical models), in order to control and interpret the models (with for example face editing). The foundational work we have been carrying on at IP-Paris on numerical simulation of continuous equations, used for modelling the physics of 3D scenes, while respecting the intrinsic properties (symmetry relations, topological invariants, etc.), leads to methods (e.g. based on differential topology, and discrete exterior calculus) that are often offering shorter computation times and increased reliability compared to more classical methods. Fundamental models are also developed for the analysis and synthesis of 3D geometry and distributions of objects, through functional maps leading to linear computation of shape correspondences, new metrics extending pair correlation functions to distributions of arbitrary objects and new algebraic composition operators for implicit modeling. In all cases fundamental research of IP-Paris teams in this area is about coupling mathematical modelling of images or 3D shapes (probabilistic, physical, geometric modelling, etc.) or knowledge with learning or physics based methods. Finally, the question is how to interact with content or systems (visualization, VR etc.) and human beings (e.g. gesture and activity recognition, machine listening, affective computing).This leads to fundamental problems in how to combine efficiently multi-modalities and multi-views for a global understanding of a scene.

Data Science and Artificial Intelligence:

The last few decades have seen an unparalleled explosion in volume, variety, and most importantly, value that lies embedded in the data. Industries ranging from manufacturing and autonomous vehicles to e-commerce, digital marketing, bio/medical and health, creative arts and humanities have realized the unprecedented potential that lies beneath the ability to expressively and efficiently turn data into insights. In parallel, the field of AI has experienced spectacular progress in the development of machine and deep learning algorithms, able to process millions of data points and calibrate complex predictive models but also encode the data in meaningful dimensional spaces. The successful deployment of these tools into their target applications raises several crucial challenges. *Data quality and meaning*, in a broad sense, comes first: data should be as complete as possible; biased, fake or wrong data should be either eliminated, or coped with through intelligent analysis methods. Towards this goal, IP Paris develops novel methods to extract data from text, to correct data by combining heterogeneous sources, and to post-process predictive models (post-hoc approaches). Robust AI techniques for working with imperfect data include mathematical programming (convex optimization with non differentiable penalties) and statistics (robust regression, extreme value theory). Representing data as a graph gives many opportunities for modeling heterogeneity and uncertainty. To extract meaning from such data, IP Paris investigates deep learning methods such as Graph Neural Networks, as well as novel graph-based methods for time series prediction and classification, image recognition, and natural language processing. Second, the AI algorithms must be *trustworthy*, in terms of robustness, fairness, reliability, explainability; this is crucial for their acceptability and their long-term usage. To evaluate the predictive performance of a data-driven decision system, IP Paris works to develop uncertainty estimation tools as well as predictive models able to abstain (a strong industrial request in critical environments), and to make deep learning models more robust under noise and adversarial attacks. *Explainability* is also an essential aspect of trust, and Hybrid AI, combining symbolic and data driven methods, are an important path to it; explainability can be reached by learning interpretable models that mimic black-box systems. Alternatively, explainability by design relies on the modification of the learning algorithm and the model to build interpretable models. We will explore trustworthiness interacting with law and ethics teams, to account for the recent legal requirements in Europe. Privacy preservation, algorithmic fairness, and formal methods for verifying properties of data processing pipelines also contribute to trust in data analysis methods. Data dynamicity in time, in particular stream processing, also needs dedicated methods. IP Paris will develop tools such as representation learning, transfer learning, unsupervised/few shot learning, as well as interactive learning methods to take into account a changing environment. Fourth but not least, *scaling up* highly elaborate data processing pipelines requires advanced optimization methods for distributed environments such as computational clouds. Holistic optimization methods will be developed to maximize the efficiency of expressive data analysis.

 

 

Research groups