Difference: 500ToolsForDataExploration (1 vs. 14)

Revision 142015-06-23 - LolaBalaguer

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

WP4 - Tools for data exploration

Description

Line: 98 to 98
 
  • Manager: X. Luri (UB)
  • Partners:
Changed:
<
<
    • UB
    • CSIC
    • FFCUL
    • UBR
    • CNRS
>
>
    • UB: Francesc Julbe
    • CSIC: Enrique Solano, Luis Sarro
    • FFCUL: Miguel Dias Duarte Ferreira Gomes, André Moitinho, Alberto Krone-Martins
    • UBR: Mark Taylor
    • CNRS: Jérôme Berthier
 

Revision 132014-12-04 - CescJulbe

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

WP4 - Tools for data exploration

Description

Line: 47 to 47
 
    • Define in collaboration with WP440 the infrastructure technology compatibility and extensions to use VO standards and services.
  • Implement, test and monitor the visualisation and interaction tools (widgets and algorithms).
Changed:
<
<

T4.3 - Data mining [Months: 1-42]

>
>

T4.3 - Data mining [Months: 1-42]

  UB, CSIC

Revision 122014-11-21 - LolaBalaguer

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

400 - Tools for data exploration

>
>

WP4 - Tools for data exploration

 

Description

A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of its potential. To fully exploit a billion object data set, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, ...) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:

Line: 30 to 30
  The full understanding of the Gaia catalogue data requires a rich set of visualization tools, that will help in the human interpretation of the data and knowledge discovery from its internal relation. To achieve that, the visualization package should support a wide variety of visualization algorithms including geometrical, volumetric methods and also advanced topological and modelling algorithms (i.e. polygon reduction, contouring, or glyphs) among others. Besides that, we must consider modern concepts of displaying (statistical) data, moving beyond simple histograms or plots towards visual knowledge inspiration and persuasive presentation components (i.e. voxels, hixels, texels representations). It will be also important to go one step forward in current research areas such as visualization of the uncertainties (errors, and their models must be seamlessly integrated and never ignored), user interactivity or cosmetics (essential for outreach, WP-730).
Changed:
<
<
The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms.
>
>
The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms.
  Although Gaia data will be multi-dimensional, visual exploration in Astronomy is mostly done using 2D representations. This reduced dimensionality has a price: It easily hides features and relations in the data and can produce cluttered views. Multiple 2D panels are often used as a solution, but the linkage between data in different panels is frequently not clear. Curiously, 3D visualization, with the gain of an extra visual dimension, is not widespread in Astronomy, where most of the data are individual entities (stars, galaxies, asteroids). It is almost exclusively used in simulations of astrophysical fluids and fields, which are extended bodies. The reason is a lack of good tools for 3D selection and interaction with point clouds. 2D interfaces, such as a mouse and keyboard, are not adapted for this kind of interaction. This is one of the most critical inhibitors of the advantages of using the extra third dimension in scientific research. There is clearly a need of developing an adequate tool for 3D interactive visualization supporting human-computer interfaces other than the mouse and keyboard.
Line: 104 to 104
 
    • FFCUL
    • UBR
    • CNRS
Added:
>
>

European_Flag.png

The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7-SPACE-2013-1) under grant agreement n°606740.

 \ No newline at end of file

Revision 112014-04-29 - LolaBalaguer

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Description

Line: 11 to 11
 
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.

Furthermore, this work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the FP7 call, we consider the task of presenting astronomy to the general public and the provision of resources for teaching
astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale.

Added:
>
>
WP4 - Tools for data exploitation [Months: 1-42] Lead beneficiary: UB Type of activity: RTD

The UB team leads this work package and will contribute most of the resources devoted to it. The personnel at the UB (see Sec. 2.2.1), led by the GENIUS coordinator X. Luri, will provide its extensive background on astrometry in general and the Gaia data in particular, and its knowledge and experience on the use of astronomical data. In addition, an experienced software engineer will be hired with the GENIUS funding and devoted full time to WP400 to provide the technical expertise necessary for the developments in this work package with the support of the UB staff. Some funding will also be devoted to specific tasks along the schedule, to employ part time software engineers already working for DPAC developments in the UB team.

T4.1 - Technical coordination [Months: 1-42]

UB

In addition to managing the resources deployed on the other WP-400 work packages, and producing reports on those activities, this work package oversees the design and specification of all work conducted under WP-400, to ensure that it adequately addresses the requirements identified within the GENIUS project and from external sources, such as the CU9 and GREAT. This WP also includes the liaison with Gaia and Science Archive team members at ESAC for the coordination in the development of exploitation tools working on the Gaia archive.

T4.2 - Visualization tools ( [Months: 1-42]

FFCUL, UB

This Work Package addresses the development of visualization tools and solutions, adapted to the large size and complexity of the Gaia archive. This includes interaction with the data, resulting in seamless visual queries to the archive.

The full understanding of the Gaia catalogue data requires a rich set of visualization tools, that will help in the human interpretation of the data and knowledge discovery from its internal relation. To achieve that, the visualization package should support a wide variety of visualization algorithms including geometrical, volumetric methods and also advanced topological and modelling algorithms (i.e. polygon reduction, contouring, or glyphs) among others. Besides that, we must consider modern concepts of displaying (statistical) data, moving beyond simple histograms or plots towards visual knowledge inspiration and persuasive presentation components (i.e. voxels, hixels, texels representations). It will be also important to go one step forward in current research areas such as visualization of the uncertainties (errors, and their models must be seamlessly integrated and never ignored), user interactivity or cosmetics (essential for outreach, WP-730).

The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms.

Although Gaia data will be multi-dimensional, visual exploration in Astronomy is mostly done using 2D representations. This reduced dimensionality has a price: It easily hides features and relations in the data and can produce cluttered views. Multiple 2D panels are often used as a solution, but the linkage between data in different panels is frequently not clear. Curiously, 3D visualization, with the gain of an extra visual dimension, is not widespread in Astronomy, where most of the data are individual entities (stars, galaxies, asteroids). It is almost exclusively used in simulations of astrophysical fluids and fields, which are extended bodies. The reason is a lack of good tools for 3D selection and interaction with point clouds. 2D interfaces, such as a mouse and keyboard, are not adapted for this kind of interaction. This is one of the most critical inhibitors of the advantages of using the extra third dimension in scientific research. There is clearly a need of developing an adequate tool for 3D interactive visualization supporting human-computer interfaces other than the mouse and keyboard.

Besides our own developed components, the analysis for the reuse and extension of widely accepted (astronomical) visualization software will be considered as part of the WP tasks. In particular the tools that support VO formats will be targeted (i.e. TOPCAT, VOSpec) in coordination with WP-440. Those tools are already using a set of different astronomic formats and allow the inclusion of several user defined formats. They also provide widgets for higher dimensional visualisation, statistics algorithms or visual comparison that will be adapted to visualise the contents of the Gaia archive and compare it against other archives. Other existing tools will have to be examined, in particular the ones that deal with parallel visualization on large clusters (i.e. using MapReduce), the open-source ParaView coprocessing library (that uses VTK) or VisIVO, a current parallel processing capable visualization tool well known in astronomy.

The tasks in this sub-work package include the contributions of the FFCUL specialised partner. The team at FFCUL will provide expertise in the development of visualization tools. Their activity in visualization studies and developments for space and earth observation further allows GENIUS to take advantage of the synergies with fields other than astronomy.

The following tasks have been identified for the visualisation WP:

  • Define the list of requirements and feasible use cases to be covered by visualization.
  • Define the architecture to support the visualization requirements.
  • Identify the existing open-source visualization tools to be used or extended to support the graphical view of the Gaia archive
  • Define the proper data models for the visualization of the requirements. In particular:
    • Define in collaboration with WP430 the requirements for data mining visualization.
    • Define in collaboration with WP440 the infrastructure technology compatibility and extensions to use VO standards and services.
  • Implement, test and monitor the visualisation and interaction tools (widgets and algorithms).

T4.3 - Data mining [Months: 1-42]

UB, CSIC

The Gaia catalogue will represent an unmatched opportunity to apply data mining techniques and algorithms as tools for knowledge discovery in a domain where there is no alternative to automated methods based on statistical learning (human exploration is certainly not feasible except for very limited subsets of data). The application of the data mining algorithms in order to extract new knowledge from the data is mandatory for a full scientific exploitation of the Gaia data. The main focus will be on Knowledge Discovery which is expected to reveal patterns and relationships within the astronomical data that can lead to the detection of new types of objects or isolated, exotic objects that represent rapid stages of stellar evolution and/or new astrophysical scenarios. Also, modelling tasks will arise from the discovered patterns. In that sense, the capability of automated dimensionality reduction (feature extraction, feature selection) and the development of key learning algorithms (clustering, outlier analysis, swarm intelligence, . . . ) implemented for parallel processing are foreseen as important.

From the architecture point of view, the DM module will have to scale to the entire Gaia data set and allow for a flexible definition of the underlying infrastructure (Cloud Computing, High Performance computing (HPC), GRID computing, and other emerging technologies). The initial approach we plan is an architecture where the mining algorithms are accessed following the paradigm of Software as a Service (SaaS) over a service oriented architecture. However, the package should also be compatible with future definitions of data mining processes, that are expected to include more complex mining work flows supporting asynchronous notifications from those services.

The tasks in this sub-work package are mainly under the UB partner, and also include the contribution of the CSIC specialised partner. Through the CSIC the team of L. Sarro will provide to GENIUS its expertise in Data Mining in astronomy, including the synergies with his work in the area inside the Gaia DPAC (see Sec. 2.2.7 of the DOW Part A).

The following tasks have been defined for the data mining WP.

  • Define the list of requirements (in coordination with WP200) and feasible use cases to be covered.
  • Define the architecture to support the mining processes listed in the requirements.
  • Define the framework to allow users to develop their own implementations of the mining algorithms.
  • Define the proper data models for the data mining based on the requirements. In particular:
    • Define in collaboration with WP420 (Visualisation) the requirements for dimensionality reduction.
    • Define in collaboration with WP300 the infrastructure technology compatibility for the data mining work flows needed by the requirements
  • Parallelise existing algorithms or libraries for Data Mining in distributed environments

T4.4 - VO tools and services [Months: 1-42]

CSIC, UBR

Besides novel modes of access to the entire Gaia archive and the emerging needs on visualisation (WP420) and data mining (WP430) it is anticipated that the more traditional archive access mode # in which a potentially complex query downloads a data set of modest size for interactive client-side processing # will continue to be important. The most efficient way to support this model is to provide a seamless interface for Gaia data acquisition from existing analysis tools in which astronomers already have expertise. We therefore intend to extend the following existing VO applications with Gaia-specific data acquisition tools:

# TOPCAT (Tool for OPerations on Catalogues And Tables http://www.star.bris.ac.uk/~mbt/topcat/) is an interactive graphical application for exploration, analysis and manipulation of tabular data, especially source catalogues, which works well with moderately large data sets (up to a few million rows and a few hundred columns; more details are given in 2.2.11). TOPCAT already offers a number of service-specific load dialogues (e.g. VizieR, Millennium Simulation), and a Gaia option would be added alongside these. Additionally, investigations will be made of whether the existing practical limits on dataset size can be increased. TOPCAT is in regular use by certainly hundreds and perhaps thousands of astronomers worldwide, and has users in 24 of the 27 EU member states. Providing direct access to Gaia data from this tool will be a highly effective way to facilitate an entry point for its exploitation.

# VOSpec : Gaia will produce a large set of spectra (spectrophotometric data for all the objects and high-resolution spectra for all objects up to G 17). VOSpec is a ESA-VO tool that can handle spectra in the VO context. It offers multi-wavelength spectral analysis and spectral widgets. The inclusion of Gaia-specific modules are foreseen for the users that have to work with spectra processing in Gaia.

# VisIVO : (Visualization Interface to the Virtual Observatory) is an open-source tool developed following the VO standards and recommendations. Data is retrieved by connecting to a VO service and loaded locally for manipulation or visualization. It can deal with multidimensional data sets of both observational and simulated data. It offers parallel processing facilities that will need to be extended to fully exploit the access to the Gaia data.

# VOSED: is a tool developed in the framework of the Spanish VO to ease the generation of Spectral Energy Distributions (SEDs). VOSED is able to build SEDs gathering information from the spectroscopic services available in VO. These datasets can be complemented with photometric information from a number of Vizier Catalogues as well as with data provided by the user.

# VOSA (http://svo.cab.inta-csic.es/theory/vosa/): a tool to query photometric catalogs accessible through VO services, query VO compliant theoretical spectra and calculate the associated synthetic photometry and derive physical parameters from the model that best reproduces the observed data.

The tasks in this sub-work package include the contributions of the CSIC and UBR specialised partners. At CSIC the team led by E. Solano (Spanish Virtual Observatory, see Sec. 2.2.7), will provide VO support and at UBR M. Taylor (main developer of TOPCAT and other VO tools, see Sec. 2.2.11) will provide the TOPCAT integration.

The following tasks have been defined for this sub-work package:

  1. Define the list of services and tools specifications to be covered using VO for Gaia. In particular:
    • Define in collaboration with WP420 (Visualisation) the requirements for VO tools and services.
    • Define in collaboration with WP430 (Data mining) the requirements for VO tools and services.
  2. Design and Implement VO services and tools for the Gaia data.
  3. Test and optimise, and validate of the VO tools and services providing performance monitoring.
  4. Define/implement the query extensions necessary to query the catalogue to fulfil the specifications.
  5. Obtain user feedback and update the tools and services if necessary
  6. Write documentation
 

Participants

  • Manager: X. Luri (UB)

Revision 102013-01-30 - SurinyeOlarte

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Description

Line: 13 to 13
 Furthermore, this work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the FP7 call, we consider the task of presenting astronomy to the general public and the provision of resources for teaching
astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale.

Participants

Changed:
<
<
  • Manager: X. Luri (UB)
  • Partners:
>
>
  • Manager: X. Luri (UB)
  • Partners:
 
    • UB
Changed:
<
<
    • INTA
>
>
    • CSIC
 
    • FFCUL
    • UBR
    • CNRS
\ No newline at end of file

Revision 92011-11-24 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Added:
>
>

Description

 
Changed:
<
<
>
>
A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of its potential. To fully exploit a billion object data set, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, ...) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:
 
Deleted:
<
<
A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:
 
  • Development of visualization tools, adapted both to the potential large size and complexity of the available data of the results of the archive queries.
Deleted:
<
<
  • Development of data mining tools adapted to the characteristics of the archive (both to its contents and the archive system), allowing the users to search and extract data based on complex criteria.

  • Development or adaptation of VO tools to the Gaia archive. In particular, the possibility of cross-matching the contents of the Gaia archive with other archives (specially with large surveys ongoing or in preparation, like LSST) should be easily available.
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.


Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale.

WP 410 Management

Overall management of WP 500

WP 420 Visualization tools

Inputs provided by A. Moitinho

  1. Survey of visualisation tools of some utility for exploring the Gaia catalogue. Technical and semantic approaches ….. We have an on-going ESA contract (VA-4D) for surveying the current available visualisation ools in Climate Sciences and Astronomy, visualisation needs and performing the corresponding gap (not GAP) assessment. Implementability of gap solutions. One of the utcomes is a conceptual design for a next generation visualisation tool. The study covers not only technical aspects, but also a more abstract component focused on the semantics and ergonomics of visualisation. Application of this type of knowledge will be necessary in the definition of Gaia visualisation.
  2. Technical solutions for visualisation besides the current study above mentioned. We have recently completed another ESA contract. This one (KD-LADS) was for knowledge discovery in large datasets and included a visualisation module - an extension of Paraview, which already gave us a little practical experience in this field. Now with the VA-4D study we are developing further expertise in the field. Writng , SRS and SDD would be natural products of our activities.
  3. Implementation provided that GENIUS gets funded so that we can support extra human resources, we can do it as UNINOVA has proved with 15 succesfull projects (13 implementations) for ESA.

A sketch of our vision

With petabyte sized databases, Science will happen when we manage to connect all this data
with usually kilobyte sized explanations. As it is attested by the portion of our brain dedicated
to the processing of visual information, the human being has its compreheension favored when
the data is presented in a visual way. The aim of scientific visualization is exactly this: to reduce
the complexity of scientific data in a way that favor the researcher understanding, and thus the
flourishing of ideas and physical interpretation.

Gaia data is highly complex in nature, and so will be the Gaia catalogue. Therefore, tools should
be provided to the research community for helping them grasping as quick and precisely as
possible the information they are searching for, as well as to facilitate and even to encourage
serendipitous discoveries. In this way, whatever tool is implemented, it should not work in
a complete passive way, waiting for commands from the user, but it should have a little bit
of active voice, suggesting some characteristics of the visualization that would facilitate the
discovery process.

One simple example of an “active visualisation” is the following one: Imagine you want to see
the MW in 3D, so you request to visualize the positions x,y,z of all the stars in the catalogue. In
this case, an “active tool” would automatically present you a 3D volume rendering of the stars,
in a way that you wouldn’t see a 3D scatter plot, with each point representing a single star, but
the global structure of the MW would be presented. Then as you zoom in the visualization, the
volume render would progressively turn into a scatter plot showing individual stars, obviously in
a fully automatic way.

Also, this tool would present realistic visualizations. Still using our example of the Galaxy, when
seen as an external galaxy a certain amount of degradation in the spacial resolution (psf) is
necessary for conferring a realistic spatial representation. The bulk of the stellar population
would be visualized as a volume rendering, some specially bright stars would be displayed as
PSFs, just like what happens when we observe (even from space) other galaxies.

Of course, basic functionalities must be available, such as tools for plotting scattered-points
data in 2d or 3d (with additional color-coded and shape-coded dimensions), but even these
features should present some kind of “active voice”. For instance, you graphically select a
certain amount of stars in a scatter diagram. Automatically you will receive a report with the % of
stars of certain types selected selected (within the sample and globally. E.g. x% of the sample
is F stars, which are y% of the F stars in the Catalogue. The same for other parameters.) This
kind of information would immediately draw attention to any unexpected selection bias, and
eventually would lead to knowledge discovery: why the hell to I have so many variable stars
here? Another appealing example is to plot unclassified stars and produce “misterious Milky
Way” maps. What kind of biases will we find here?

This highlights how we must study what kind of representations can provide a broad view of the
Gaia catalogue. i.e. seeing a Milky Way map is not a general view of the contents. The design
of the visualisation system will rely on the definition of key statistics representing the catalogue
contents.

Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has
a fundamental importance are the measurement errors. Any tool to be implemented for visual
exploitation of Gaia data must take the catalogue errors into account during the visualisation
process in a seamless way, if they expect to have some real scientific value.
Architecture and functionality of visualisation must be driven by use case scenarios, like those
being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual
usage in a broad sense. There will always be specific needs in special cases that we cannot
predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal
tool.

Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

WP 430 Data mining tools

Objectives

The objective is to implement the infrastructure to allow common data mining tasks in the Gaia Archive. The focus will be in Knowledge Discovery (new types of objects, exotic objects, similarity-based queries, etc) and modeling. The DM (Data Mining) module will have to scale to the entire Gaia dataset and allow for a flexible definition of the underlying infrastructure (Cloud Computing, GRID computing, and other emerging technologies).

Tasks

  • Define the list of use cases to be made feasible.
  • Define in collaboration with WP530 (Visualisation) the requirements for dimensionality reduction.
  • Define in collaboration with WP300 the infrastructure technology compatibility.
  • Parallelise existing libraries for Data Mining in distributed environments.
  • Implement a submodule to allow the user to provide his own algorithms.
  • Write documentation.

Input

  • Simulated data
  • Gaia Main DataBase
  • Existing DM (Data Mining) libraries

Output

Data Mining capabilities integrated in the Gaia Archive.

WP 440 VO tools and services

Objectives

The objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data.

Tasks

  • Acquire specifications to develop the VO tools and services
  • Design, develop and implement the VO tools and services
  • Test, optimization, and validation of the VO tools and services
  • Monitor performance of the tools and services
  • Obtain feedback from users
  • Update VO tools and services if necessary
  • Write documentation

Input

  • Simulated data
  • IVOA tools

Output


GAIA VO tools and services

Suggestion from Mark Taylor, TOPCAT developer

TOPCAT (which I've developed over about the last 8 years) is a
graphical tool for analysis and interactive exploration of tabular
data which works well with moderately large datasets (1e6-1e7 rows,
1e2 columns); it does plotting, selections, crossmatching,
calculations, and a load of other stuff. It's already in quite
wide use, and already ticks a number of the buzzwords in the
WP500 introduction slide - it does visualisation, it's very VO-friendly
(and very well-known by the VO group at ESAC), it's been used to
some extent for outreach (though that hasn't been a high priority
before now), and I'm looking at adding some data mining capabilities.
In its current incarnation it is not scalable up to 1e9 rows
(which of course couldn't be reasonably transmitted
from an archive server to a client-side tool in any case), so I'm
by no means suggesting that it's the single solution to the
question that WP500 is seeking to answer, but I do think that a tool
of this nature is an important part of the armoury that a user
of the Gaia archive will want, and as far as I know, TOPCAT is
the most capable one around.

STILTS is a complementary suite of command-line tools based on the
same technology. Both are implemented in pure java.

The web pages of these tools are here:

http://www.starlink.ac.uk/topcat/
http://www.starlink.ac.uk/stilts/

I don't have much background with Gaia, and I haven't worked on
writing an FP* proposal before now, so I don't have a very clear
idea of what's required here. However, I can imagine that once
there are requirements for a user-facing tool that can provide
the data exploration functionality being discussed here, adding
such functionality to an existing powerful and widely-used tool
will be a more effective way to tackle it than starting from scratch.
One concrete and fairly straightforward possibility that comes to
mind is adding a Gaia-specific load dialogue to TOPCAT which makes
it easy to interroate the archive to get data into the tool
(similar requirements from users of other projects in the past
have led to custom load dialogues for VizieR and Millennium
Database access services).

WP 440 Grand challenges

 \ No newline at end of file
Added:
>
>
  • Development of data mining tools and infrastructure adapted to the characteristics of the archive (both to its contents and the archive system), allowing the users to perform data mining tasks and extract new knowledge .
  • Development or adaptation of VO tools and services to the Gaia archive. In particular, the possibility of cross-matching the contents of the Gaia archive with other archives (specially with large surveys ongoing or in preparation, like LSST) should be easily available.
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.

Furthermore, this work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the FP7 call, we consider the task of presenting astronomy to the general public and the provision of resources for teaching
astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale.

Participants

  • Manager: X. Luri (UB)
  • Partners:
    • UB
    • INTA
    • FFCUL
    • UBR
    • CNRS
 \ No newline at end of file

Revision 82011-10-28 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Line: 43 to 43
  Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

WP 430 Data mining tools

Added:
>
>

Objectives

The objective is to implement the infrastructure to allow common data mining tasks in the Gaia Archive. The focus will be in Knowledge Discovery (new types of objects, exotic objects, similarity-based queries, etc) and modeling. The DM (Data Mining) module will have to scale to the entire Gaia dataset and allow for a flexible definition of the underlying infrastructure (Cloud Computing, GRID computing, and other emerging technologies).

Tasks

  • Define the list of use cases to be made feasible.
  • Define in collaboration with WP530 (Visualisation) the requirements for dimensionality reduction.
  • Define in collaboration with WP300 the infrastructure technology compatibility.
  • Parallelise existing libraries for Data Mining in distributed environments.
  • Implement a submodule to allow the user to provide his own algorithms.
  • Write documentation.

Input

  • Simulated data
  • Gaia Main DataBase
  • Existing DM (Data Mining) libraries

Output

Data Mining capabilities integrated in the Gaia Archive.

 

WP 440 VO tools and services

Changed:
<
<

Objectives

>
>

Objectives

  The objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data.
Changed:
<
<

Tasks

>
>

Tasks

 
  • Acquire specifications to develop the VO tools and services
  • Design, develop and implement the VO tools and services
  • Test, optimization, and validation of the VO tools and services

Revision 72011-10-28 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Line: 43 to 43
  Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

WP 430 Data mining tools

Changed:
<
<

WP 440 VO tools

>
>

WP 440 VO tools and services

Objectives

 
Changed:
<
<
Suggestion from Mark Taylor, TOPCAT developer:
>
>
The objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data.

Tasks

  • Acquire specifications to develop the VO tools and services
  • Design, develop and implement the VO tools and services
  • Test, optimization, and validation of the VO tools and services
  • Monitor performance of the tools and services
  • Obtain feedback from users
  • Update VO tools and services if necessary
  • Write documentation

Input

  • Simulated data
  • IVOA tools

Output


GAIA VO tools and services

Suggestion from Mark Taylor, TOPCAT developer

  TOPCAT (which I've developed over about the last 8 years) is a
graphical tool for analysis and interactive exploration of tabular
data which works well with moderately large datasets (1e6-1e7 rows,
1e2 columns); it does plotting, selections, crossmatching,
calculations, and a load of other stuff. It's already in quite
wide use, and already ticks a number of the buzzwords in the
WP500 introduction slide - it does visualisation, it's very VO-friendly
(and very well-known by the VO group at ESAC), it's been used to
some extent for outreach (though that hasn't been a high priority
before now), and I'm looking at adding some data mining capabilities.
In its current incarnation it is not scalable up to 1e9 rows
(which of course couldn't be reasonably transmitted
from an archive server to a client-side tool in any case), so I'm
by no means suggesting that it's the single solution to the
question that WP500 is seeking to answer, but I do think that a tool
of this nature is an important part of the armoury that a user
of the Gaia archive will want, and as far as I know, TOPCAT is
the most capable one around.

STILTS is a complementary suite of command-line tools based on the
same technology. Both are implemented in pure java.

The web pages of these tools are here:

http://www.starlink.ac.uk/topcat/
http://www.starlink.ac.uk/stilts/

I don't have much background with Gaia, and I haven't worked on
writing an FP* proposal before now, so I don't have a very clear
idea of what's required here. However, I can imagine that once
there are requirements for a user-facing tool that can provide
the data exploration functionality being discussed here, adding
such functionality to an existing powerful and widely-used tool
will be a more effective way to tackle it than starting from scratch.
One concrete and fairly straightforward possibility that comes to
mind is adding a Gaia-specific load dialogue to TOPCAT which makes
it easy to interroate the archive to get data into the tool
(similar requirements from users of other projects in the past
have led to custom load dialogues for VizieR and Millennium
Database access services).

WP 440 Grand challenges

\ No newline at end of file

Revision 62011-10-26 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

400 - Tools for data exploration

Line: 43 to 43
  Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

WP 430 Data mining tools

Added:
>
>

WP 440 VO tools

Suggestion from Mark Taylor, TOPCAT developer:

TOPCAT (which I've developed over about the last 8 years) is a
graphical tool for analysis and interactive exploration of tabular
data which works well with moderately large datasets (1e6-1e7 rows,
1e2 columns); it does plotting, selections, crossmatching,
calculations, and a load of other stuff. It's already in quite
wide use, and already ticks a number of the buzzwords in the
WP500 introduction slide - it does visualisation, it's very VO-friendly
(and very well-known by the VO group at ESAC), it's been used to
some extent for outreach (though that hasn't been a high priority
before now), and I'm looking at adding some data mining capabilities.
In its current incarnation it is not scalable up to 1e9 rows
(which of course couldn't be reasonably transmitted
from an archive server to a client-side tool in any case), so I'm
by no means suggesting that it's the single solution to the
question that WP500 is seeking to answer, but I do think that a tool
of this nature is an important part of the armoury that a user
of the Gaia archive will want, and as far as I know, TOPCAT is
the most capable one around.

STILTS is a complementary suite of command-line tools based on the
same technology. Both are implemented in pure java.

The web pages of these tools are here:

http://www.starlink.ac.uk/topcat/
http://www.starlink.ac.uk/stilts/

I don't have much background with Gaia, and I haven't worked on
writing an FP* proposal before now, so I don't have a very clear
idea of what's required here. However, I can imagine that once
there are requirements for a user-facing tool that can provide
the data exploration functionality being discussed here, adding
such functionality to an existing powerful and widely-used tool
will be a more effective way to tackle it than starting from scratch.
One concrete and fairly straightforward possibility that comes to
mind is adding a Gaia-specific load dialogue to TOPCAT which makes
it easy to interroate the archive to get data into the tool
(similar requirements from users of other projects in the past
have led to custom load dialogues for VizieR and Millennium
Database access services).

 

WP 440 Grand challenges

\ No newline at end of file

Revision 52011-10-25 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

500 - Tools for data exploration

>
>

400 - Tools for data exploration

 
Line: 14 to 14
 
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale.
Changed:
<
<

WP 510 Management

>
>

WP 410 Management

  Overall management of WP 500
Changed:
<
<

WP 520 Visualization tools

>
>

WP 420 Visualization tools

  Inputs provided by A. Moitinho
Line: 42 to 42
 Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has
a fundamental importance are the measurement errors. Any tool to be implemented for visual
exploitation of Gaia data must take the catalogue errors into account during the visualisation
process in a seamless way, if they expect to have some real scientific value.
Architecture and functionality of visualisation must be driven by use case scenarios, like those
being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual
usage in a broad sense. There will always be specific needs in special cases that we cannot
predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal
tool.

Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

Changed:
<
<

WP 530 Data mining tools

WP 540 Grand challenges

>
>

WP 430 Data mining tools

WP 440 Grand challenges

 \ No newline at end of file

Revision 42011-10-24 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

500 - Tools for data exploration

Added:
>
>
 A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:
  • Development of visualization tools, adapted both to the potential large size and complexity of the available data of the results of the archive queries.
Line: 11 to 13
 
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.


Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. \ No newline at end of file

Added:
>
>

WP 510 Management

Overall management of WP 500

WP 520 Visualization tools

Inputs provided by A. Moitinho

  1. Survey of visualisation tools of some utility for exploring the Gaia catalogue. Technical and semantic approaches ….. We have an on-going ESA contract (VA-4D) for surveying the current available visualisation ools in Climate Sciences and Astronomy, visualisation needs and performing the corresponding gap (not GAP) assessment. Implementability of gap solutions. One of the utcomes is a conceptual design for a next generation visualisation tool. The study covers not only technical aspects, but also a more abstract component focused on the semantics and ergonomics of visualisation. Application of this type of knowledge will be necessary in the definition of Gaia visualisation.
  2. Technical solutions for visualisation besides the current study above mentioned. We have recently completed another ESA contract. This one (KD-LADS) was for knowledge discovery in large datasets and included a visualisation module - an extension of Paraview, which already gave us a little practical experience in this field. Now with the VA-4D study we are developing further expertise in the field. Writng , SRS and SDD would be natural products of our activities.
  3. Implementation provided that GENIUS gets funded so that we can support extra human resources, we can do it as UNINOVA has proved with 15 succesfull projects (13 implementations) for ESA.

A sketch of our vision

With petabyte sized databases, Science will happen when we manage to connect all this data
with usually kilobyte sized explanations. As it is attested by the portion of our brain dedicated
to the processing of visual information, the human being has its compreheension favored when
the data is presented in a visual way. The aim of scientific visualization is exactly this: to reduce
the complexity of scientific data in a way that favor the researcher understanding, and thus the
flourishing of ideas and physical interpretation.

Gaia data is highly complex in nature, and so will be the Gaia catalogue. Therefore, tools should
be provided to the research community for helping them grasping as quick and precisely as
possible the information they are searching for, as well as to facilitate and even to encourage
serendipitous discoveries. In this way, whatever tool is implemented, it should not work in
a complete passive way, waiting for commands from the user, but it should have a little bit
of active voice, suggesting some characteristics of the visualization that would facilitate the
discovery process.

One simple example of an “active visualisation” is the following one: Imagine you want to see
the MW in 3D, so you request to visualize the positions x,y,z of all the stars in the catalogue. In
this case, an “active tool” would automatically present you a 3D volume rendering of the stars,
in a way that you wouldn’t see a 3D scatter plot, with each point representing a single star, but
the global structure of the MW would be presented. Then as you zoom in the visualization, the
volume render would progressively turn into a scatter plot showing individual stars, obviously in
a fully automatic way.

Also, this tool would present realistic visualizations. Still using our example of the Galaxy, when
seen as an external galaxy a certain amount of degradation in the spacial resolution (psf) is
necessary for conferring a realistic spatial representation. The bulk of the stellar population
would be visualized as a volume rendering, some specially bright stars would be displayed as
PSFs, just like what happens when we observe (even from space) other galaxies.

Of course, basic functionalities must be available, such as tools for plotting scattered-points
data in 2d or 3d (with additional color-coded and shape-coded dimensions), but even these
features should present some kind of “active voice”. For instance, you graphically select a
certain amount of stars in a scatter diagram. Automatically you will receive a report with the % of
stars of certain types selected selected (within the sample and globally. E.g. x% of the sample
is F stars, which are y% of the F stars in the Catalogue. The same for other parameters.) This
kind of information would immediately draw attention to any unexpected selection bias, and
eventually would lead to knowledge discovery: why the hell to I have so many variable stars
here? Another appealing example is to plot unclassified stars and produce “misterious Milky
Way” maps. What kind of biases will we find here?

This highlights how we must study what kind of representations can provide a broad view of the
Gaia catalogue. i.e. seeing a Milky Way map is not a general view of the contents. The design
of the visualisation system will rely on the definition of key statistics representing the catalogue
contents.

Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has
a fundamental importance are the measurement errors. Any tool to be implemented for visual
exploitation of Gaia data must take the catalogue errors into account during the visualisation
process in a seamless way, if they expect to have some real scientific value.
Architecture and functionality of visualisation must be driven by use case scenarios, like those
being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual
usage in a broad sense. There will always be specific needs in special cases that we cannot
predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal
tool.

Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way,
allow zooming and paning, selection of data based on positions or any other measurements
(color, chemical composition, kinematics, etc). It should be able to represent and allow
interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped
via Gaia extinction or measurements from radio surveys). Selection should be possible either
directly on the data parameters or with the help of some classification scheme. The tool would
also allow fitting or comparing theoretical and semi-empirical models to observations.
We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet
comfortable and the interaction approaches are not efficient. This must really be researched.
However, 3D displays and interfaces are becoming widespread in the entertainment market.
We have to port this experience into scientific visualisation. Why? because we gain an extra
dimension to analyse simultaneously. Younger people will certainly be used to these systems.
Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots,
although useful to the researcher, do not have visual appeal for the public. To overcome this
scientist-public barrier, artist impressions are usually produced but have the inconvenient of
being very qualitatively and even misleading due to some exaggeration. The ideal tool should
provide some (automatic) cosmetic qualities.

WP 530 Data mining tools

WP 540 Grand challenges

Revision 32011-10-13 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

500 - Tools for data exploration

Line: 11 to 11
 
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.


Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale.

Deleted:
<
<
<--  
-->
 \ No newline at end of file

Revision 22011-10-12 - XaviLuri

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

500 - Tools for data exploration

Line: 8 to 8
 
  • Development of data mining tools adapted to the characteristics of the archive (both to its contents and the archive system), allowing the users to search and extract data based on complex criteria.

  • Development or adaptation of VO tools to the Gaia archive. In particular, the possibility of cross-matching the contents of the Gaia archive with other archives (specially with large surveys ongoing or in preparation, like LSST) should be easily available.
Added:
>
>
  • Development of tools for the Grand Challenges outlined in WP 200, that will involve complex and massive exploration of the data.
 
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale.

Revision 12011-10-10 - XaviLuri

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

500 - Tools for data exploration

A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:

  • Development of visualization tools, adapted both to the potential large size and complexity of the available data of the results of the archive queries.

  • Development of data mining tools adapted to the characteristics of the archive (both to its contents and the archive system), allowing the users to search and extract data based on complex criteria.

  • Development or adaptation of VO tools to the Gaia archive. In particular, the possibility of cross-matching the contents of the Gaia archive with other archives (specially with large surveys ongoing or in preparation, like LSST) should be easily available.


Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale.

<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback