Difference: DataMining_WP_150_User_Interface ( vs. 1)

Revision 12015-06-04 - CescJulbe

Line: 1 to 1
Added:
> >	DESCRIPTION INTERFACES Console Direct execution environment << Web interface >> Deprecated! Comments DESCRIPTION User must be able to execute Data Mining tasks and Machine Learning algorithms in the distributed computing infrastructure that the Gaia archive will provide. Data mining queries can be complex and not only a single operation but a more complete pipeline, so the interfaces must allow the users to define these complex queries, test them and finally submit them to the production cluster. We propose two different execution environments, a test one with less security restrictions with a subset of the archive to test algorithms and pipelines and the production environment, where a defined job can be submitted. INTERFACES Console Data Mining tasks generally require a good knowledge of the data to query and fine tuning of algorithms and processes through trial and error learning so we think a console where interactively perform these operations is a strong requirement. Spark provides a Scala and python consoles, but other alternatives can be considered as R. Console should be allowed in the test environment to access a subset of the archive. Direct execution environment This environment will allow users to upload their own implementations (i.e. compiled jar files) and submit them directly to the cluster similar to a submit job script in an HPC environment. The environment should be set to allow the user to use latest MlLib libraries and other advanced dependencies needed. Security policies have to be defined to be applied to the job before being submitted. << Web interface >> Deprecated! Once a task is completely defined (tested and verified, such as a trained model in the test environment) we can configure a tasks through a web interface. Main features of this interface should be: Data selection Method/algorithm to perform on the data and configuration parameters. Pipeline definition Environment to execute (¿Do we allow execution on both environments through the web interface or only to the production one?). Job validation/ submission/ monitoring -- Cesc Julbe (erased user) - 2015-06-04 Comments (Edit - Preview) <--/commentPlugin-->

Line: 1 to 1

Added:

>
>

DESCRIPTION
INTERFACES
Comments

DESCRIPTION

User must be able to execute Data Mining tasks and Machine Learning algorithms in the distributed computing infrastructure that the Gaia archive will provide. Data mining queries can be complex and not only a single operation but a more complete pipeline, so the interfaces must allow the users to define these complex queries, test them and finally submit them to the production cluster.

We propose two different execution environments, a test one with less security restrictions with a subset of the archive to test algorithms and pipelines and the production environment, where a defined job can be submitted.

INTERFACES

Console

Data Mining tasks generally require a good knowledge of the data to query and fine tuning of algorithms and processes through trial and error learning so we think a console where interactively perform these operations is a strong requirement. Spark provides a Scala and python consoles, but other alternatives can be considered as R.

Console should be allowed in the test environment to access a subset of the archive.

Direct execution environment

This environment will allow users to upload their own implementations (i.e. compiled jar files) and submit them directly to the cluster similar to a submit job script in an HPC environment. The environment should be set to allow the user to use latest MlLib libraries and other advanced dependencies needed. Security policies have to be defined to be applied to the job before being submitted.

<< Web interface >> Deprecated!

Once a task is completely defined (tested and verified, such as a trained model in the test environment) we can configure a tasks through a web interface. Main features of this interface should be:

Data selection
Method/algorithm to perform on the data and configuration parameters.
Pipeline definition
Environment to execute (¿Do we allow execution on both environments through the web interface or only to the production one?).
Job validation/ submission/ monitoring

Cesc Julbe (erased user) - 2015-06-04

Comments

<--/commentPlugin-->

View topic | History: r1 | More topic actions...