Difference: TeleconMarch_2015 ( vs. 1)

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="DataMiningGENIUSPage"

Telecon 3rd MARCH 2015, MEETINGS MINUTES

Attendants:

It is necessary to review the FTE list created at the very beginning of the project.

A new email will be sent asking the people of the original list if they are going to participate. Although they can now be deleted from the list, they can join the WP later on.
Luis already has a list of people (already n the WP list), ready to start working on the WP. Access to the cluster is necessary (-2- infrastructure section). They would develop/port advanced algorithms already implemented, mainly in R, to Python preferable or other language compatible with the technologies we are planning to use (Scala, Java or Python in Spark & Hadoop frameworks). A work plan will be done for the second half of 2015.

The current cluster is in unstable state, one node is down and Cloudera can't restart the services properly. CSUC is working on having the cluster up and running.
A new cluster is going to be purchased for the WP. Still waiting for news from CSUC about the status of this purchase.

Daniel and Angel Berihuete have been progressing on the implementation. Some code is ready to be tested in the cluster for scalability and performance evaluation. The code so far has been written in Python.
Modules developed in this Grand challenge are going to be usable for other use cases.
For this Grand Challenge, synthetic masses are being generated. But there is already some code written (in Java) that can estimate masses from BP/RP spectra. This piece of the flow can also be incorporated to the use case. Still, no spectra is going to be present in the first releases.

It will be necessary to store intermediate data. A clarification with the SAT team is necessary to establish a policy about it.
Data coming from GACS should be converted into some Hadoop friendly format (parquet?), so a data extraction and serialization process has to be discussed and established. So far we got an ASCII sample from GACS (10e6 sources) with ‘|’ separator for testing purposes.

Meeting with SAT team to be proposed for discussing Data Mining requirements and proposals (May?).
Relevant milestones/releases/meetings:
- TGAS: Small release to be used by DM for concept testing.
- Release1: End of 2016. More advanced DM tasks About10e8 sources.
- CU9 plenary meeting on November

<--/commentPlugin-->