18.03.2019 - Seminarium Instytutowe - godz. 13:00,
Michał Dramiński, Adam Filip i Michał J. Dąbrowski (IPI PAN)
MCFS-ID (Monte Carlo Feature Selection and Interdependency Discovery) is a Monte Carlo method-based algorithm for feature selection. It returns a ranked list of informative features, and thus play a significant role in the classification of objects that belong to different classes. This is achieved through constructing thousands of decision trees. MCFS-ID also allows for the discovery of interdependencies between the features, visualized as a directed graph of the pairwise interdependencies found. The discovered interdependencies thus provide a basis for making causal hypotheses to be verified using background knowledge. MCFS-ID algorithm is publicly available as the R package - rmcfs. This tool was successfully applied in many classification problems, among them on the Cancer Genome Atlas dataset consisting of various molecular information: protein coding gene expression levels (mRNA), microRNA expression levels (miRNA) and DNA methylation status (HumanMethylation450), in order to define prognostic markers for Breast Invasive Carcinoma (BRCA).Each object in the dataset is thus described by over half million of features, a vast majority of them unrelated or next to unrelated to the problem under study. It will be shown that the algorithm returns truly significant features, i.e., features of importance in biological cancer-related pathways as well as unveils true interdependencies between different molecular characteristics.
Prezentują: Michał Dramiński, Adam Filip