Wojciech Penczek, Ph.D., Professor,
Corresponding member of PAS
Agnieszka Mykowiecka, Ph.D., Associate Professor
Contact:
Secretary: tel. +48 22 380-05-04, +48 22 380-05-05
Main phone number of the Institute: tel. +48 22 380-05-00
fax. +48 22 380-05-10
Włodzimierz Drabent, Ph.D., Professor
Marek Tudruj, Ph.D., Professor (on leave)
WWW: https://ztsrio.ipipan.waw.pl/
Activities and interests of the group members center around the following topics:
The leading researchers working on the above subjects: W. Jamroga, W. Penczek.
PhD student at TIBPAN
Activities and interests of the group members center around the following topics:
Jan Mielniczuk, Ph.D., Professor
Marcin Malawski, Ph.D. (on leave)
Our team at the Artificial Intelligence Fundamental Research Laboratory has been conducting intensive research on the leading challenges of Artificial Intelligence (also called Computational Intelligence) for four decades. Artificial Intelligence (AI) is a branch of computer science that deals with solving problems for which there are no algorithmic solutions or they are computationally too complex. In this spirit, the Team participated in the development of a system for analyzing data on the health effects of the Chernobyl disaster, a system supporting the diagnosis of hand injuries, a system for distributed knowledge extraction from medical data, a system for pro-ecological optimization of the power supply of Polish power plant network, a system for assessing candidates for the pilot profession, the first Polish large-scale semantic internet search engine, consumer price development evaluation system and many others.
Research on specific applications of AI was coupled with the development of inference and learning theories for uncertain and incomplete information (including Bayesian networks and Dempster-Shafer theory), the development of optimization methods inspired by nature (including immune networks, herd, genetic and extreme optimization algorithms), methods of extracting knowledge from numerical data, text and hypertext (new algorithms for cluster analysis and classification, including in the field of graph spectral analysis, new methods for extracting relationships of hierarchical concepts and simple relationships from natural language texts) and others. Currently, the Team has undertaken the hottest and most important challenge of developing Explainable Artificial Intelligence (XAI) methods. XAI is a response to industry objections that artificial intelligence methods such as deep neural networks, evolutionary algorithms and other operate on the principle of a "black box", while only transparent methods are trusted by business. Our Team took on a particularly difficult challenge, i.e. achieving explainability in the field of cluster analysis of text documents, especially those clustered using spectral methods. The basic difficulty lies in the lack of a coherent axiomatic system for cluster analysis. What is more, grievvant, spectral methods detach the representation of clusters from the textual content of documents. Our achievements in this area include:
WWW: {Externnal link in new window: }http://zil.ipipan.waw.pl/
The Linguistic Engineering Group (Pol. Zespół Inżynierii Lingwistycznej; ZIL) deals with multiple aspects of Natural Language Processing.
ZIL's traditional area of interest is deep syntactic parsing of Polish, with the use of Definite Clause Grammars (DCG) and generative linguistic formalisms, such as Head-driven Phrase Structure Grammar (HPSG) and Lexical Functional Grammar (LFG). For each of these approaches, a grammar of Polish has been developed and implemented, with current work concentrating on DCG and LFG.
Another important focus of the Group's research is widely understood information extraction: many publications have been devoted to the automatic extraction of structured data from domain texts, to named entity recognition and to shallow parsing in general. Related work includes automatic acquisition of linguistic knowledge - including valence frames - from corpus data.
More recently, ZIL has also been dealing with the semantic processing of texts, focusing on word sense disambiguation, coreference resolution and sentiment analysis. Certain elements of semantic processing are present in the LFG parser mentioned above. More application-oriented work within this thread concerns automatic summarisation and text categorisation.
The Group is also active in the area of corpus linguistics. ZIL coordinated the development of the 1.5-billion-word National Corpus of Polish (Pol. Narodowy Korpus Języka Polskiego; NKJP), based to some extent on the earlier IPI PAN Corpus. In the process, the Group created various tools for manual and automatic corpus annotation at multiple linguistic levels, an XML schema for corpus annotation, and a manually annotated 1-million-word subcorpus. This subcorpus is the empirical basis for the Składnica treebank which is currently being developed; Składnica has already been used to train a dependency parser for Polish.
Various tools created by the Group are publicly available as open source software. They include: morphosyntactic taggers, a shallow parser Spejd, a deep parser Świgra, a named entity recogniser Nerf, a word sense disambiguation platform WSDDE, corpus tools Poliqarp and Anotatornia, etc. The Group is also responsible for the development of an open morphological dictionary PoliMorf - to be used in deep parsing and other applications - based on earlier such dictionaries. The above tools and resources are used in applications co-developed by ZIL, e.g., in a multilingual content management system.
ZIL has been and is active in multiple national and international projects.
For more information, please visit: http://zil.ipipan.waw.pl/.
PhD student at TIBPAN
WWW: {External link (open in new window): } https://zams.ipipan.waw.pl/
The members of the group conduct research on generalizations of well-established methods of machine learning to the case of uplift modelling which concerns modelling of causal influence of a given action (e.g. marketing campaign, medical therapy) at the level of an individual by taking into account control group not subjected to the action. The theory of of linear models for the uplift case is also being developed.
The domains researched by the group include information theoretic and probabilistic modelling of a natural language. Objects of a special interest here are discrete stochastic processes with strong dependence which is measured by the rate of increase of a block entropy and a length of a maximal repetition. Such processes exhibit certain statistical properties which are close to those found in natural language productions, e.g. related to fulfilling Hilberg hypothesis. Their construction is studied as well as statistical inference for them with applications in computational linguistics.
Subsequent research direction concerns classification methods for multivariate response variables. An intensively studied special case is so-called multi-label classification when the response is multivariate variable with binary coordinates. Of a particular interest is construction of effective methods for high-dimensional data when high-dimensionality refers to large number of potential predictors as well as to dimensionality of the response. The aim of the research is development of algorithms (as well as theoretical analysis of their performance) for variable selection and prediction in this set-up.
Variable selection is also studied for high-dimensional generalized linear and additive models. Here, we study two- and multi-step procedures in which selection is executed based on information criteria after performing preliminary screening and/or ranking of the variables pertaining to values of their importance measures. The measures are constructed based on large number of small models with randomly chosen predictors. The main results concern selection consistency when assumed model for data at hand is correctly specified. The analogous problem is also studied for the misspecification case with the concept of selection consistency suitably modified.
Research concerning modelling stochastic dependence using copula-based approach is also being pursued by the group.
For more information, please visit: http://zams.ipipan.waw.pl/.
WWW: {External link (open new window): }http://zbo.ipipan.waw.pl/
Computational Biology Group (CBG) is a unit in the Department of Artificial Intelligence.We focus on the functions of non-coding DNA regions that may lead to detecting regulatory disorders that disrupt biological pathways. To better understand the development of various diseases, we study variation in the genomic, epigenomic, proteomic and other -omic layers of regulation of gene expression. We employ multidisciplinary knowledge, including statistics, mathematical modeling, machine learning, programming, Big Data analysis, parallel computing, biochemistry, ecology, evolution, and molecular biology, to discover the mechanistic structure of a wide range of biological issues. In CBG, we combine the achievements of a leading computer science institute with the recent advances of biotechnologies applied to Life Sciences. We offer an interdisciplinary agora for biologists, statisticians, linguists, oncologists and computer scientists.
Our research interests include:
We developed a system for selecting and ranking features in classification tasks using decision trees and a Monte Carlo method (MCFS-ID) - rmcfs; a system for constructing classifiers (ROSETTA) based on Pawlak’s rough sets - R.ROSETTA, and a DNA methylation analysis toolkit CytoMeth. Our further work on methods focuses on finding interdependencies between significant features (implemented in the MCFS-ID system) and developing methodologies for rule networks generated from rough set models. In the field of bio-data analysis CBG has made several significant contributions to modeling the pathogenicity of the bird flu virus and in research on mutations in regulatory regions of the genome and their correlation with carcinogenesis.
The results of our work on the regulation of gene expression in glioma is an atlas of regulatory regions in the human brain (transcription regions, transcription factor binding sites, enhancers, chromatin structure and histone modifications) ["Mapping chromatin accessibility and active regulatory elements reveals pathological mechanisms in human gliomas" - article in Natural Communications, 2021], constructed in cooperation with the Nencki Institute of the Polish Academy of Sciences and the Institute of Informatics of the University of Warsaw, financed by the National Science Center Symfonia 3 grant. We continue research on transcriptional regulation in glioma, focusing on the role of the REST and KAISO transcription factors.
As part of our research on carcinogenesis, we identify possible regulatory networks composed of epigenetic features important in predicting the formation and development of breast cancer. Other projects carried out in CBG include the analysis of population and molecular biology data, such as signaling pathways related to carcinogenesis and DNA methylation in Rheumatoid Arthritis.
CBG leads Bioinformatics course at the Doctoral School of Information and Biomedical Technologies of the Polish Academy of Sciences (TIB PAN)
For more information, please visit: http://zbo.ipipan.waw.pl/.