Outlying Aspects Mining

Code (Matlab/C++ Mex)


1. Discovering Outlying Aspects in Large Datasets. Nguyen Xuan Vinh, Jeffrey Chan, Simone Romano, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao and Jian Pei. To appear in Data Mining and Knowledge Discovery.

Mutual Information Based Feature Selection Repository

Code (Matlab/C++ Mex) for the following MI based feature selection approaches:
– Maximum relevance (maxRel)
– Minimum redundancy maximum relevance (MRMR)
– Minimum redundancy (minRed)
– Quadratic programming feature selection (QPFS)
– Mutual information quotient (MIQ)
– Maximum relevance minimum total redundancy  (MRMTR) or extended MRMR (EMRMR)
– Spectral relaxation global Conditional Mutual Information (SPEC_CMI)
– Conditional mutual information minimization (CMIM)
– Conditional Infomax Feature Extraction (CIFE)
[1] Nguyen X. Vinh, Jeffrey Chan, Simone Romano and James Bailey, “Effective Global Approaches for Mutual Information based Feature Selection”. To appear in Proceeedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’14), August 24-27, New York City, 2014. [PDF]

Mutual Information Based Feature Selection – A Statistical View  CODE (MATLAB/C++) 

1. Vinh Nguyen, Jeffrey Chan and James Bailey, “Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View“, To appear in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI-14), Quebec City, Canada, July 27-31 2014. [PDF]


GLOBALMIT – Learning Globally Optimal Dynamic Bayesian Network With The Mutual Information Test (MIT) And Minimum Description Length (MDL) Criteria (matlab/c++)

Dynamic Bayesian networks (DBN) are widely applied in modeling various biological networks, including the gene regulatory network. Due to several NP-hardness results on learning static Bayesian network, most methods for learning DBN are heuristic, that employ either local search such as greedy hill-climbing, or a meta optimization framework such as genetic algorithm or simulated annealing.We present GlobalMIT, a toolbox for learning the globally optimal DBN structure using a recently introduced information theoretic based scoring metric named mutual information test (MIT). Under MIT, learning the globally optimal DBN can be efficiently achieved in polynomial time. The toolbox is implemented in Matlab, with also a C++ stand-alone implementation of the search engine for improved performance.Reference:1. ‘GlobalMIT: Learning Globally Optimal Dynamic Bayesian Network with the Mutual Information Test (MIT) Criterion‘, Vinh, N. X., Chetty, M., Coppel, R., and Wangikar, P. P. (2011), Bioinformatics [ERA rank A*], in press Pre-publication PDF  Technical Report.

A variant of GlobalMIT which learns globally optimal DBNs under the MIT and MDL metrics, without the equi-cardinality constraints on the variables, is available here.

2. ‘Local and Global Algorithms for Learning Dynamic Bayesian Networks‘, Vinh Nguyen, Madhu Chetty, Pramod Wangikar and Ross Coppel, The IEEE International Conference on Data Mining (ICDM), 2012 [Full paper, acceptance rate 10.7%].

CODE for Computing The Adjusted Mutual Information (AMI) In Matlab


1. ‘Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance‘, N.X. Vinh, Epps, J. and Bailey, J. (2010), the Journal of Machine Learning Research (JMLR), 11(Oct), 2837-54, [ERA rank A] PDF.

2. ‘Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?‘, N.X. Vinh, Epps, J. and Bailey, J., in Proc. the 26th International Conference on Machine Learning (ICML’09), June 2009, Montreal, Canada, PDF [ERA rank A, acceptance rate: 23.5%].

Sebastian Schmidt (University of Zurich) wrote a nice R script for computing the AMI on massive clusterings with hundreds of thousands of clusters

CODE for The Mincentropy Algorithm For Alternative Clustering (matlab)


1. ‘minCEntropy: a Novel Information Theoretic Approach for the Generation of Alternative Clusterings,’ N. X. Vinh, Epps, J., the 10th IEEE Int. Conf. on Data Mining (ICDM’10), 2010, [ERA rank A, acceptance rate: 9% full paper]. PDF

CODE For The Spherical K-means Clustering Algorithm (Matlab)


1. ‘Gene Clustering on the Unit Hypersphere with the Spherical K-means algorithm: coping with extremely large number of local optima‘, N.X. Vinh, In Proc. the 2008 International Conference on Bioinformatics and Computational Biology (BIOCOMP), Las Vegas, 14-17 Jul 2008. PDF.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s