TWiki> GENIUS Web>500ToolsForDataExploration>DataMiningGENIUSPage>AlgorithmTechniques (2014-12-04, CescJulbe)

-- Cesc Julbe (erased user) - 2014-12-04

Prior to all of the forums below, we have to decide to what extent are we going to re-use existing software.

- Adopt one Data Mining Platform as the basis for the WP? Which? Requirements?
- Adopt several and make them compatible via wrappers? Which? Requirements?
- Develop all from scratch?

http://www.mlbase.org/ - Something possibly of interest (under active development now). It aims at providing a higher level interface for machine learning (picking the best algorithm for each use case based on cross-validation, etc). The latest paper can be found at http://arxiv.org/pdf/1310.5426v2.pdf

Regarding the platform to use: Most of the new frameworks and tools support HDFS so it shouldn't be difficult to adopt a couple of them and perform a seamless integration. The two most promising ones (at a first glance) may be Mahout and Spark. Mahout has many algorithms implemented already, but its performance is poor for those that are iterative. On the other hand, Spark is the best option for iterative algorithms (e.g. Logistic Regression) although it doesn't (currently) provide as many algorithms as Mahout. We'll definitely need to try out both.

Initial list for discussion:

- PCA, ICA
- Filter techniques (Mutual information, Gain Ratio, chi^2, correlation based...)
- Wrapper techniques
- Manifold-related techniques (Diffusion Maps, LLE...)

Initial list for discussion:

- LDA
- ANN
- SVM
- Bayesian Networks
- Bayesian inference with forward models
- Gaussian Processes

Initial list for discussion:

- k-means
- Self-Organised Feature Maps
- Density Based clustering (HMAC, DBSCAN,...)
- Connectivity based
- Parametric or model based clustering (Autoclass, EM)
- Sub-space clustering
- Spectral Clustering
- Clustering by Message passing

Initial list for discussion:

- ROC curves
- k-fold Cross Validation
- hypothesis tests

- Genetic Algorithms
- Swarm optimization
- Genetic programming

**Work packages**

FP7-SPACE-2013-1

Grant n. 606740.

Copyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback