Foundations of Computer Science
Shared Seminars:
 Seminars of the LIX lab
 Seminar of Pole Proof and Algorithms.
 Algorithms of Saclay Plateau Seminar
 Joint between LIX and several other labs from ParisSaclay University
 Seminar of Combinatorics of Plateau de Saclay
 Joint between LISN and LI
Foundations of Computer Science
Presentation
Today’s society is a digitalized society. From smart grids to nuclear plants, commercial airplanes to the longawaited autonomous cars, the web and communication networks, imaging, data analysis and AI applications in all industries (finances, health, security etc.), there is no single aspect of our everyday life that is not relying on computer science advances. The fundamentals of computer science and information theory are what gathers the computer science, data science and artificial intelligence community at IPParis, whatever intended applications are: computer scientists use and produce theories and models that should better explain, improve performances, security and safety, and that should allow for better treatments of data, at scale, with privacy certificates etc. Indeed, computer science is a very peculiar science in which “theory” and “practice” are intertwined, with rapid paths to and from theory. Our goal is to shed light on these fundamentals. This includes “core” fundamental computer science activities on “models of computation”, i.e. logics, semantics, algorithmics, information theory, machine models (sequential, analog, biological, quantum, concurrent, distributed and communication networks, datadriven etc.), formal languages, programming languages, concurrency and distributed systems, formal methods, etc. These core fundamentals underpin all research themes of the Computer Science, Data Science and Artificial Intelligence Department of IPParis. The teams of IPParis have a quite unique position in France, with a large community (about 230 permanent faculty, 650 total), covering a wide spectrum of subjects: Computer Science in Biotechnologies and Health; Digital Trust; Next Generation Digital Infrastructure; Robotics, Visual and audio Computing, Interaction; Data Science and Artificial Intelligence. Our objective to strengthen collaborations on fundamental research in all these themes, to both irrigate these themes with new, high risk high gain solutions, and the core fundamentals with new motivating scientific challenges. This is therefore an integrative project by essence. The dual objective of this project: “core” aspects and the “transverse” one are described below.
Core: “Models, Computations and Languages”:
Theory of computation has long preceded its actual implementation without even mentioning its practical use. From 19th century major advances in logics, to early 20th century considerations on algebraic computations (Thue systems, and Dehn’s algorithm, ancestors of modern rewriting systems) and first “theoretical machines” (Turing machines, Post machines, lambdacalculus etc.) models of computation have predated actual computers, a phenomenon we are still contemplating with the development of non von Neuman types of machines, in quantum, biological, concurrent, distributed and even datadriven computing and communication networks.

We are working on adequate semantic descriptions of programs, machines and systems, to be able to reason about their properties. This has created since the 1970s, worldwide, an enormous amount of new theories, sometimes revisiting some branches of mathematics, e.g. in topology (nonHausdorff topologies developed for giving semantics to programming languages), logics and categorical logics (e.g. cartesianclosed categories, proof theory and models of the lambdacalculus). Some of the new and specific approaches in which teams of IPParis excel are related to the semantic description of numerical and hybrid systems, developing suitable (functional) approximation theories (e.g. Taylor methods) and setbased methods, for fast and accurate verification of control systems, as well as highercategorical methods for rewriting and geometric and combinatorial (based on simplicial and cubical sets in particular) models for designing, proving complexity or impossibility results (e.g. impossibility of certain equality negation tasks) about concurrent and distributed protocols.
 Our goal is to develop efficient algorithms for the various architectures: we focus on measuring how this efficiency, and associated proof theory, depends on the hypotheses made on the systems, and on the allowed primitives. New and foundational results on computing with differential equations (polynomial method for solving differential equations on a noncompact domain and Turing completeness) can be used both as a very expressive and efficient programming paradigm (analog and/or bio computers) as well as a promise for more efficient numerical schemes. Another instance of foundational work at IPParis, is the resolution of a longopen conjecture about the complexity of multiplication between numbers, proving it can be solved in theory in O(n log n) complexity. Not only is this a difficult and beautiful result, but it also holds the promise of potential applications to new ways for calculating. New fundamental results on randomized algorithms have led to important applications in AI, and in particular on concrete consequences on how mutation operations must be selected for genetic algorithms. We also work on some practical optimization methods with the associated theory. We develop tools and methods in computer algebra, combinatorics, computational complexity and cryptography to help with conceiving guaranteed protocols and methods, or to understand their underlying theory and properties.
 Based on suitable semantic models, we propose corresponding tools (theoretical and practical) to be able to study these systems, and guarantee their behaviour, from a computer science perspective. This means covering associated proof technology and theory, verification, computability and complexity theory. Each kind of semantics has its associated verification technology, which are at the core of the fundamentals of computer science we seek to develop: topological invariants for geometric semantics, logical systems and their associated proof theory for logic based semantics, graphbased exploration techniques and logics for modelchecking, setbased methods and general semantic abstractions for abstract interpretation based analysers, both for discrete and hybrid systems. When verifying or testing critical systems, guaranteed fault coverage is crucial and in some cases, an exhaustive test suite can have a polynomial length. Model based testing (MBT) is an alternative for guaranteeing the system behavior, moreover differently from model checking approaches the test cases can be applied to a “real” implementation under test. Studying various specifications (FSMs, automata, etc.), related conformance relations (equivalence, reduction, etc.) and fault models provide novel MBT approaches, alongside the study of their computational complexity.
 We also develop (programming, specification, formal) languages, that we, as computer scientists, know how much they play a very central role: mathematical models as such are indeed not a good support for practical applications. All semantic abstractions (hierarchical descriptions through some form or another of modules, classes, functions, present in most modern programming paradigms, either functional, imperative, objectoriented, domain specific etc.) that have been incorporated in modern languages are the very reason of the rapid adoption of software based machines for all aspects touching everyday life (energy, transportation, banking, services, networks, videos etc.). A similar endeavour, central to our project, touches nonsequential programming, for instance distributed and concurrent programming, where hybrid local/distributed computations are the norm now but lack well adopted language support (reflecting semantic and algorithmic diversity). Programming at large (including algorithm description, analysis and specification, manipulation of proofs etc.) is at the same time art, engineering, and deep science. But, contrary to the early days of computer science, the rapid adoption of software in all industries has fostered an ever increasing gap between theory and practice. This is the case with e.g. fully integrated development processes (“DevOps”, including specification, documentation, compilation, and validation/testing activities), that are indeed necessary for hoping to tackle the enormous complexity and long lived aspects of system and software life cycles but are not by any means a definitive answer. For such development processes, we are still in a need for the firm corresponding foundations. New longedfor computers (quantum, biological etc.) are in the same need, if ever to be deployed at a large scale.
Transverse:
Fundamental research from applications include the following:
Computer Science in Biotechnologies and Health:
Up to two decades ago, Bioinformatics mainly developed with algorithmic tools to reason on the sequences and structure branches of molecular cell biology, ranging from DNA word problems, phylogenetic tree reconstruction, spatial structure abstractions of RNA and proteins, to prediction and optimisation of molecular interactions. Since then, Bioinformatics is also developing on the great challenge of Systems Biology to gain systemlevel understanding of molecular interaction networks responsible for the highlevel functions of the cell, with the development of boolean abstractions and modelcheckers for molecular reaction networks, a theory of analog computability and complexity, the verification and synthesis of cyberphysical systems, up to the view of chemical reaction networks as a programming language for Synthetic Biology. This effort tightly coupled with research on data analytics, multimodal data fusion, stability and robustness of classifiers, geometric representations, and modelbased image analysis, augmented reality and medical robotics, is currently participating altogether in a scientific revolution in Health, Medicine and the Pharma industry.
Digital Trust:
Digital trust is about dependability, i.e. the conjunction of reliability, safety, availability, maintainability and (cyber)security properties. New hardware (based on complex multicore architectures) forces us to study novel fundamental modeling techniques, testing algorithms and formal methods to cope with the more involved nondeterministic behaviour of hardware and corresponding algorithms, and for allowing the practical use of this new computation power in critical applications, such as embedded control systems and CPS. The fundamental semantic considerations we develop on numerical and control systems in particular lead to deriving new efficient reachability techniques, as well as setbased and abstract interpretation, algebraic and geometric (based on “qualitative dynamics” and algebraic topological invariants) methods for deriving invariants of discrete, continuous or hybrid systems, in view of their verification. In CPS, more and more controllers are datadriven, i.e. implemented using neural networks, instead of being (only) modelbased (with more classical PIDs, or flat, or optimal controllers of some sort). New fundamentals on functional approximation theory and methods are among the current research themes we are exploring in this domain. Ultimately, new research is being developed for modeling and verifying software and systems in their environment, physical as for CPS but also taking into account the distributed aspects and the properties of the underlying communication networks (such as the autonomous and connected cars, swarms of drones…). (Cyber)Security is based on the enforcement of properties by running systems, classically derived from the ConfidentialityIntegrityAvailability triad. On confidentiality and integrity properties, our work stems from fundamental work on cryptographic encodings, based on e.g. algebraic curves. Still, many of the processes and methods used to define and enforce properties are relatively adhoc and defined in a “negative” manner. By adhoc, we mean that the methodological process depends strongly on humans for understanding the risks and ensuring that the proper controls are in place, from design to operations. By negative, we mean that these processes are often in place to ensure that undesired events do not occur, but they do little to ensure comprehensive coverage and quantify the benefits of these cybersecurity mechanisms. There is a need to leverage the vast body of knowledge about software and operations vulnerabilities, to propose methods for software development and service operations that ensure immunity from errors. In addition, the increasing influence of digital systems in critical infrastructures brings together security and safety, requiring new theoretical models to solve both at the same time and in a compatible manner. There is both a scientific interest in linking formal methods (as used for validating systems and software) with cybersecurity and a practical interest: some of the classical cyberattacks take advantage of programming defects (e.g. potential dereference out of array bounds) or CPS controller defects (e.g. stuxnet for attacking plants). Linked to this is the research needed for verifying network protocols, prone to cyberattacks or bad behaviours. A connection to data science fundamentals comes through fundamental methods ensuring privacy, such as differential privacy, when exchanging data over a network, even in a secure (unbreachable) way.
Next Generation Digital Infrastructure:
Our digital infrastructures are large and georeplicated, and they spawn many computing units ranging from datacenters to user devices and sensors. At IP Paris, we have the goal of making these systems more performant, more energy efficient, more autonomous, safer and more resilient to faults. We study both the theoretical and practical aspects of these systems in a virtuous loop that feeds each other. We use and advance, for example, research in combinatorial topology to model and prove distributed systems, in operational research to model the energy consumption of a digital infrastructure or in data analysis to predict the best data and task placement on a supercomputer. We then design heuristics and build systems to experimentally evaluate the theoretical results, which in turn feeds the research in fundamental computer science by proposing more realistic models of our digital infrastructure or new theoretical problems. A typical example of theory developed at IP Paris is the HeardOf model, a model of calculus that captures both the degree of synchrony and the notion of faulty components in an unified model. This model is today widely used as a building block to prove the correctness of many distributed algorithms implemented in real systems. Network engineering is also intensively making the best of various mathematical theories ranging from probability to game theory, topology, random geometric simplicial complex, Malliavin calculus, all are fundamental subjects that we want to put forward in this project. Planning, dimensioning, configuring, or routing are reaching such a level of complexity that they cannot scale up and meet their requirements without these tools any longer. This challenges basic research and steers progress in many of its aspects. Finally, communication networks testing (protocols, services, etc.) is a large application area, starting from the 90s of the last century. Novel results in Operations Research bring original efficient solutions in networks analysis, including resource allocation, optimization, scheduling, etc. At the same time, computational complexity in graph theory and combinatorial optimization provides a “bank” of decision and derivation problems whose hardness has been already proven that allows to evaluate proposed solutions and related algorithms. For example, the Virtual Network Embedding problem was proven to be NPhard through a reduction from the Maximum Stable Set Problem.
Robotics, Visual and audio Computing, Interaction:
At IPParis, we obtained some fundamental results about motion planning, and its intrinsic complexity as a function of the geometry of the underlying configuration space. This has led us to develop a new topological complexity measure applicable to motion planning, and some practical application in the simplest case of nonpositively curved configuration spaces. The theoretical tools for verification of distributed systems, cyberphysical systems or hybrid systems also find direct applications in certifying the safety of robots, intelligent vehicles or multirobot systems.They are also used in the identification or training of these systems to find guaranteed models or improve training efficiency. In return such applications open opportunities to develop new theory, for example on the verification of the perception system in robots. For example, we demonstrated the fundamental limitation that learning disentangled representation in robotics cannot be done using only passive observations, but requires active data acquisition and taking the actions into account so that the representation mirrors invariants from the physical world. This opens the way to new representation learning approaches more adapted to autonomous systems. Articulation between modelling and learning is indeed a unifying point of this theme. A fundamental question we work on is how to introduce knowledge in machine learning for image, video, audio, virtual and multimedia content. This knowledge can be physical knowledge on the data acquisition system, semantic knowledge, geometric and mechanical constraints,… and various mathematical frameworks can be exploited to model it (Riemannian geometry, stochastic modeling, mathematical morphology, symbolic approaches…). Similarly, many works are led to generate content. Our current research work is to generate models of audio signals, images and videos with constraints on the latent space of neural networks (e.g. through simulation of physical models), in order to control and interpret the models (with for example face editing). The foundational work we have been carrying on at IPParis on numerical simulation of continuous equations, used for modelling the physics of 3D scenes, while respecting the intrinsic properties (symmetry relations, topological invariants, etc.), leads to methods (e.g. based on differential topology, and discrete exterior calculus) that are often offering shorter computation times and increased reliability compared to more classical methods. Fundamental models are also developed for the analysis and synthesis of 3D geometry and distributions of objects, through functional maps leading to linear computation of shape correspondences, new metrics extending pair correlation functions to distributions of arbitrary objects and new algebraic composition operators for implicit modeling. In all cases fundamental research of IPParis teams in this area is about coupling mathematical modelling of images or 3D shapes (probabilistic, physical, geometric modelling, etc.) or knowledge with learning or physics based methods. Finally, the question is how to interact with content or systems (visualization, VR etc.) and human beings (e.g. gesture and activity recognition, machine listening, affective computing).This leads to fundamental problems in how to combine efficiently multimodalities and multiviews for a global understanding of a scene.
Data Science and Artificial Intelligence:
The last few decades have seen an unparalleled explosion in volume, variety, and most importantly, value that lies embedded in the data. Industries ranging from manufacturing and autonomous vehicles to ecommerce, digital marketing, bio/medical and health, creative arts and humanities have realized the unprecedented potential that lies beneath the ability to expressively and efficiently turn data into insights. In parallel, the field of AI has experienced spectacular progress in the development of machine and deep learning algorithms, able to process millions of data points and calibrate complex predictive models but also encode the data in meaningful dimensional spaces. The successful deployment of these tools into their target applications raises several crucial challenges. *Data quality and meaning*, in a broad sense, comes first: data should be as complete as possible; biased, fake or wrong data should be either eliminated, or coped with through intelligent analysis methods. Towards this goal, IP Paris develops novel methods to extract data from text, to correct data by combining heterogeneous sources, and to postprocess predictive models (posthoc approaches). Robust AI techniques for working with imperfect data include mathematical programming (convex optimization with non differentiable penalties) and statistics (robust regression, extreme value theory). Representing data as a graph gives many opportunities for modeling heterogeneity and uncertainty. To extract meaning from such data, IP Paris investigates deep learning methods such as Graph Neural Networks, as well as novel graphbased methods for time series prediction and classification, image recognition, and natural language processing. Second, the AI algorithms must be *trustworthy*, in terms of robustness, fairness, reliability, explainability; this is crucial for their acceptability and their longterm usage. To evaluate the predictive performance of a datadriven decision system, IP Paris works to develop uncertainty estimation tools as well as predictive models able to abstain (a strong industrial request in critical environments), and to make deep learning models more robust under noise and adversarial attacks. *Explainability* is also an essential aspect of trust, and Hybrid AI, combining symbolic and data driven methods, are an important path to it; explainability can be reached by learning interpretable models that mimic blackbox systems. Alternatively, explainability by design relies on the modification of the learning algorithm and the model to build interpretable models. We will explore trustworthiness interacting with law and ethics teams, to account for the recent legal requirements in Europe. Privacy preservation, algorithmic fairness, and formal methods for verifying properties of data processing pipelines also contribute to trust in data analysis methods. Data dynamicity in time, in particular stream processing, also needs dedicated methods. IP Paris will develop tools such as representation learning, transfer learning, unsupervised/few shot learning, as well as interactive learning methods to take into account a changing environment. Fourth but not least, *scaling up* highly elaborate data processing pipelines requires advanced optimization methods for distributed environments such as computational clouds. Holistic optimization methods will be developed to maximize the efficiency of expressive data analysis.
Research groups
 Ecole Polytechnique
 LIX
 Pole “Proofs ans algorithms”
 Teams: Alco, Cosynus, Partout
 Pole “Efficient and secure communications”
 Team: Grace
 Pole “Computer Mathematics”
 Teams: Combi, Max, Optimix
 Pole “Proofs ans algorithms”
 LIX
 ENSTA
 U2IS
 Team Semantics of Hybrid Systems of U2IS
 U2IS
 INRIA
 Teams: Lifeware, Specfun
 INRIA & X:
 Team: Partout
 Telecom SudParis
 SAMOVAR
 Teams: Methodes, ACMES, R3S of SAMOVAR
 SAMOVAR
 Telecom:
 LTCI:
 Teams: Autonomous Critical Embedded Systems, Telecom (ACES), Information Quantique et Applications (IQA)
 LTCI: