User must be able to execute Data Mining tasks and Machine Learning algorithms in the distributed computing infrastructure that the Gaia archive will provide. Data mining queries can be complex and not only a single operation but a more complete pipeline, so the interfaces must allow the users to define these complex queries, test them and finally submit them to the production cluster.

We propose two different execution environments, a test one with less security restrictions with a subset of the archive to test algorithms and pipelines and the production environment, where a defined job can be submitted.



Data Mining tasks generally require a good knowledge of the data to query and fine tuning of algorithms and processes through trial and error learning so we think a console where interactively perform these operations is a strong requirement. Spark provides a Scala and python consoles, but other alternatives can be considered as R.

Console should be allowed in the test environment to access a subset of the archive.

Direct execution environment

This environment will allow users to upload their own implementations (i.e. compiled jar files) and submit them directly to the cluster similar to a submit job script in an HPC environment. The environment should be set to allow the user to use latest MlLib libraries and other advanced dependencies needed. Security policies have to be defined to be applied to the job before being submitted.

<< Web interface >> Deprecated!

Once a task is completely defined (tested and verified, such as a trained model in the test environment) we can configure a tasks through a web interface. Main features of this interface should be:

  • Data selection
  • Method/algorithm to perform on the data and configuration parameters.
  • Pipeline definition
  • Environment to execute (┐Do we allow execution on both environments through the web interface or only to the production one?).
  • Job validation/ submission/ monitoring
-- Cesc Julbe (erased user) - 2015-06-04


Topic revision: r1 - 2015-06-04 - CescJulbe
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback