Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
WP4 - Tools for data explorationDescription | ||||||||
Line: 47 to 47 | ||||||||
| ||||||||
Changed: | ||||||||
< < | T4.3 - Data mining [Months: 1-42] | |||||||
> > | T4.3 - Data mining [Months: 1-42] | |||||||
UB, CSIC |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Changed: | ||||||||
< < | 400 - Tools for data exploration | |||||||
> > | WP4 - Tools for data exploration | |||||||
DescriptionA use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of its potential. To fully exploit a billion object data set, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, ...) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP200 to ensure that they are tailored to the actual needs of the scientific user community. It will include: | ||||||||
Line: 30 to 30 | ||||||||
The full understanding of the Gaia catalogue data requires a rich set of visualization tools, that will help in the human interpretation of the data and knowledge discovery from its internal relation. To achieve that, the visualization package should support a wide variety of visualization algorithms including geometrical, volumetric methods and also advanced topological and modelling algorithms (i.e. polygon reduction, contouring, or glyphs) among others. Besides that, we must consider modern concepts of displaying (statistical) data, moving beyond simple histograms or plots towards visual knowledge inspiration and persuasive presentation components (i.e. voxels, hixels, texels representations). It will be also important to go one step forward in current research areas such as visualization of the uncertainties (errors, and their models must be seamlessly integrated and never ignored), user interactivity or cosmetics (essential for outreach, WP-730). | ||||||||
Changed: | ||||||||
< < | The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms. | |||||||
> > | The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms. | |||||||
Although Gaia data will be multi-dimensional, visual exploration in Astronomy is mostly done using 2D representations. This reduced dimensionality has a price: It easily hides features and relations in the data and can produce cluttered views. Multiple 2D panels are often used as a solution, but the linkage between data in different panels is frequently not clear. Curiously, 3D visualization, with the gain of an extra visual dimension, is not widespread in Astronomy, where most of the data are individual entities (stars, galaxies, asteroids). It is almost exclusively used in simulations of astrophysical fluids and fields, which are extended bodies. The reason is a lack of good tools for 3D selection and interaction with point clouds. 2D interfaces, such as a mouse and keyboard, are not adapted for this kind of interaction. This is one of the most critical inhibitors of the advantages of using the extra third dimension in scientific research. There is clearly a need of developing an adequate tool for 3D interactive visualization supporting human-computer interfaces other than the mouse and keyboard. | ||||||||
Line: 104 to 104 | ||||||||
| ||||||||
Added: | ||||||||
> > |
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7-SPACE-2013-1) under grant agreement n°606740. | |||||||
\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data explorationDescription | ||||||||
Line: 11 to 11 | ||||||||
astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale. | ||||||||
Added: | ||||||||
> > |
WP4 - Tools for data exploitation [Months: 1-42] Lead beneficiary: UB Type of activity: RTD
The UB team leads this work package and will contribute most of the resources devoted to it. The personnel at the UB (see Sec. 2.2.1), led by the GENIUS coordinator X. Luri, will provide its extensive background on astrometry in general and the Gaia data in particular, and its knowledge and experience on the use of astronomical data. In addition, an experienced software engineer will be hired with the GENIUS funding and devoted full time to WP400 to provide the technical expertise necessary for the developments in this work package with the support of the UB staff. Some funding will also be devoted to specific tasks along the schedule, to employ part time software engineers already working for DPAC developments in the UB team.
T4.1 - Technical coordination [Months: 1-42]UB In addition to managing the resources deployed on the other WP-400 work packages, and producing reports on those activities, this work package oversees the design and specification of all work conducted under WP-400, to ensure that it adequately addresses the requirements identified within the GENIUS project and from external sources, such as the CU9 and GREAT. This WP also includes the liaison with Gaia and Science Archive team members at ESAC for the coordination in the development of exploitation tools working on the Gaia archive.T4.2 - Visualization tools ( [Months: 1-42]FFCUL, UB This Work Package addresses the development of visualization tools and solutions, adapted to the large size and complexity of the Gaia archive. This includes interaction with the data, resulting in seamless visual queries to the archive. The full understanding of the Gaia catalogue data requires a rich set of visualization tools, that will help in the human interpretation of the data and knowledge discovery from its internal relation. To achieve that, the visualization package should support a wide variety of visualization algorithms including geometrical, volumetric methods and also advanced topological and modelling algorithms (i.e. polygon reduction, contouring, or glyphs) among others. Besides that, we must consider modern concepts of displaying (statistical) data, moving beyond simple histograms or plots towards visual knowledge inspiration and persuasive presentation components (i.e. voxels, hixels, texels representations). It will be also important to go one step forward in current research areas such as visualization of the uncertainties (errors, and their models must be seamlessly integrated and never ignored), user interactivity or cosmetics (essential for outreach, WP-730). The core components of the visualization framework that interact with different (N-dimensional) graphic widgets and the algorithms will have to be provided as part of this package. Internal (server–side) parallel processing of massive data sets and provision for easy human interaction will have to be considered. From the hardware infrastructure the visualization package will have to allow for a flexible definition underlying the client and serverside egressing technologies and platforms. Although Gaia data will be multi-dimensional, visual exploration in Astronomy is mostly done using 2D representations. This reduced dimensionality has a price: It easily hides features and relations in the data and can produce cluttered views. Multiple 2D panels are often used as a solution, but the linkage between data in different panels is frequently not clear. Curiously, 3D visualization, with the gain of an extra visual dimension, is not widespread in Astronomy, where most of the data are individual entities (stars, galaxies, asteroids). It is almost exclusively used in simulations of astrophysical fluids and fields, which are extended bodies. The reason is a lack of good tools for 3D selection and interaction with point clouds. 2D interfaces, such as a mouse and keyboard, are not adapted for this kind of interaction. This is one of the most critical inhibitors of the advantages of using the extra third dimension in scientific research. There is clearly a need of developing an adequate tool for 3D interactive visualization supporting human-computer interfaces other than the mouse and keyboard. Besides our own developed components, the analysis for the reuse and extension of widely accepted (astronomical) visualization software will be considered as part of the WP tasks. In particular the tools that support VO formats will be targeted (i.e. TOPCAT, VOSpec) in coordination with WP-440. Those tools are already using a set of different astronomic formats and allow the inclusion of several user defined formats. They also provide widgets for higher dimensional visualisation, statistics algorithms or visual comparison that will be adapted to visualise the contents of the Gaia archive and compare it against other archives. Other existing tools will have to be examined, in particular the ones that deal with parallel visualization on large clusters (i.e. using MapReduce), the open-source ParaView coprocessing library (that uses VTK) or VisIVO, a current parallel processing capable visualization tool well known in astronomy. The tasks in this sub-work package include the contributions of the FFCUL specialised partner. The team at FFCUL will provide expertise in the development of visualization tools. Their activity in visualization studies and developments for space and earth observation further allows GENIUS to take advantage of the synergies with fields other than astronomy. The following tasks have been identified for the visualisation WP:
T4.3 - Data mining [Months: 1-42]UB, CSIC The Gaia catalogue will represent an unmatched opportunity to apply data mining techniques and algorithms as tools for knowledge discovery in a domain where there is no alternative to automated methods based on statistical learning (human exploration is certainly not feasible except for very limited subsets of data). The application of the data mining algorithms in order to extract new knowledge from the data is mandatory for a full scientific exploitation of the Gaia data. The main focus will be on Knowledge Discovery which is expected to reveal patterns and relationships within the astronomical data that can lead to the detection of new types of objects or isolated, exotic objects that represent rapid stages of stellar evolution and/or new astrophysical scenarios. Also, modelling tasks will arise from the discovered patterns. In that sense, the capability of automated dimensionality reduction (feature extraction, feature selection) and the development of key learning algorithms (clustering, outlier analysis, swarm intelligence, . . . ) implemented for parallel processing are foreseen as important. From the architecture point of view, the DM module will have to scale to the entire Gaia data set and allow for a flexible definition of the underlying infrastructure (Cloud Computing, High Performance computing (HPC), GRID computing, and other emerging technologies). The initial approach we plan is an architecture where the mining algorithms are accessed following the paradigm of Software as a Service (SaaS) over a service oriented architecture. However, the package should also be compatible with future definitions of data mining processes, that are expected to include more complex mining work flows supporting asynchronous notifications from those services. The tasks in this sub-work package are mainly under the UB partner, and also include the contribution of the CSIC specialised partner. Through the CSIC the team of L. Sarro will provide to GENIUS its expertise in Data Mining in astronomy, including the synergies with his work in the area inside the Gaia DPAC (see Sec. 2.2.7 of the DOW Part A). The following tasks have been defined for the data mining WP.
T4.4 - VO tools and services [Months: 1-42]CSIC, UBR Besides novel modes of access to the entire Gaia archive and the emerging needs on visualisation (WP420) and data mining (WP430) it is anticipated that the more traditional archive access mode # in which a potentially complex query downloads a data set of modest size for interactive client-side processing # will continue to be important. The most efficient way to support this model is to provide a seamless interface for Gaia data acquisition from existing analysis tools in which astronomers already have expertise. We therefore intend to extend the following existing VO applications with Gaia-specific data acquisition tools: # TOPCAT (Tool for OPerations on Catalogues And Tables http://www.star.bris.ac.uk/~mbt/topcat/) is an interactive graphical application for exploration, analysis and manipulation of tabular data, especially source catalogues, which works well with moderately large data sets (up to a few million rows and a few hundred columns; more details are given in 2.2.11). TOPCAT already offers a number of service-specific load dialogues (e.g. VizieR, Millennium Simulation), and a Gaia option would be added alongside these. Additionally, investigations will be made of whether the existing practical limits on dataset size can be increased. TOPCAT is in regular use by certainly hundreds and perhaps thousands of astronomers worldwide, and has users in 24 of the 27 EU member states. Providing direct access to Gaia data from this tool will be a highly effective way to facilitate an entry point for its exploitation. # VOSpec : Gaia will produce a large set of spectra (spectrophotometric data for all the objects and high-resolution spectra for all objects up to G 17). VOSpec is a ESA-VO tool that can handle spectra in the VO context. It offers multi-wavelength spectral analysis and spectral widgets. The inclusion of Gaia-specific modules are foreseen for the users that have to work with spectra processing in Gaia. # VisIVO : (Visualization Interface to the Virtual Observatory) is an open-source tool developed following the VO standards and recommendations. Data is retrieved by connecting to a VO service and loaded locally for manipulation or visualization. It can deal with multidimensional data sets of both observational and simulated data. It offers parallel processing facilities that will need to be extended to fully exploit the access to the Gaia data. # VOSED: is a tool developed in the framework of the Spanish VO to ease the generation of Spectral Energy Distributions (SEDs). VOSED is able to build SEDs gathering information from the spectroscopic services available in VO. These datasets can be complemented with photometric information from a number of Vizier Catalogues as well as with data provided by the user. # VOSA (http://svo.cab.inta-csic.es/theory/vosa/): a tool to query photometric catalogs accessible through VO services, query VO compliant theoretical spectra and calculate the associated synthetic photometry and derive physical parameters from the model that best reproduces the observed data. The tasks in this sub-work package include the contributions of the CSIC and UBR specialised partners. At CSIC the team led by E. Solano (Spanish Virtual Observatory, see Sec. 2.2.7), will provide VO support and at UBR M. Taylor (main developer of TOPCAT and other VO tools, see Sec. 2.2.11) will provide the TOPCAT integration. The following tasks have been defined for this sub-work package:
| |||||||
Participants
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data explorationDescription | ||||||||
Line: 13 to 13 | ||||||||
Furthermore, this work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the FP7 call, we consider the task of presenting astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale. Participants | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data exploration | ||||||||
Added: | ||||||||
> > | Description | |||||||
Changed: | ||||||||
< < | ||||||||
> > | A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of its potential. To fully exploit a billion object data set, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, ...) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP200 to ensure that they are tailored to the actual needs of the scientific user community. It will include: | |||||||
Deleted: | ||||||||
< < | A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include: | |||||||
| ||||||||
Deleted: | ||||||||
< < |
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. WP 410 ManagementOverall management of WP 500WP 420 Visualization toolsInputs provided by A. Moitinho
with usually kilobyte sized explanations. As it is attested by the portion of our brain dedicated to the processing of visual information, the human being has its compreheension favored when the data is presented in a visual way. The aim of scientific visualization is exactly this: to reduce the complexity of scientific data in a way that favor the researcher understanding, and thus the flourishing of ideas and physical interpretation. Gaia data is highly complex in nature, and so will be the Gaia catalogue. Therefore, tools should be provided to the research community for helping them grasping as quick and precisely as possible the information they are searching for, as well as to facilitate and even to encourage serendipitous discoveries. In this way, whatever tool is implemented, it should not work in a complete passive way, waiting for commands from the user, but it should have a little bit of active voice, suggesting some characteristics of the visualization that would facilitate the discovery process. One simple example of an “active visualisation” is the following one: Imagine you want to see the MW in 3D, so you request to visualize the positions x,y,z of all the stars in the catalogue. In this case, an “active tool” would automatically present you a 3D volume rendering of the stars, in a way that you wouldn’t see a 3D scatter plot, with each point representing a single star, but the global structure of the MW would be presented. Then as you zoom in the visualization, the volume render would progressively turn into a scatter plot showing individual stars, obviously in a fully automatic way. Also, this tool would present realistic visualizations. Still using our example of the Galaxy, when seen as an external galaxy a certain amount of degradation in the spacial resolution (psf) is necessary for conferring a realistic spatial representation. The bulk of the stellar population would be visualized as a volume rendering, some specially bright stars would be displayed as PSFs, just like what happens when we observe (even from space) other galaxies. Of course, basic functionalities must be available, such as tools for plotting scattered-points data in 2d or 3d (with additional color-coded and shape-coded dimensions), but even these features should present some kind of “active voice”. For instance, you graphically select a certain amount of stars in a scatter diagram. Automatically you will receive a report with the % of stars of certain types selected selected (within the sample and globally. E.g. x% of the sample is F stars, which are y% of the F stars in the Catalogue. The same for other parameters.) This kind of information would immediately draw attention to any unexpected selection bias, and eventually would lead to knowledge discovery: why the hell to I have so many variable stars here? Another appealing example is to plot unclassified stars and produce “misterious Milky Way” maps. What kind of biases will we find here? This highlights how we must study what kind of representations can provide a broad view of the Gaia catalogue. i.e. seeing a Milky Way map is not a general view of the contents. The design of the visualisation system will rely on the definition of key statistics representing the catalogue contents. Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has a fundamental importance are the measurement errors. Any tool to be implemented for visual exploitation of Gaia data must take the catalogue errors into account during the visualisation process in a seamless way, if they expect to have some real scientific value. Architecture and functionality of visualisation must be driven by use case scenarios, like those being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual usage in a broad sense. There will always be specific needs in special cases that we cannot predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal tool. Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. WP 430 Data mining toolsObjectivesThe objective is to implement the infrastructure to allow common data mining tasks in the Gaia Archive. The focus will be in Knowledge Discovery (new types of objects, exotic objects, similarity-based queries, etc) and modeling. The DM (Data Mining) module will have to scale to the entire Gaia dataset and allow for a flexible definition of the underlying infrastructure (Cloud Computing, GRID computing, and other emerging technologies).Tasks
Input
OutputData Mining capabilities integrated in the Gaia Archive.WP 440 VO tools and servicesObjectivesThe objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data.Tasks
Input
OutputGAIA VO tools and services Suggestion from Mark Taylor, TOPCAT developerTOPCAT (which I've developed over about the last 8 years) is agraphical tool for analysis and interactive exploration of tabular data which works well with moderately large datasets (1e6-1e7 rows, 1e2 columns); it does plotting, selections, crossmatching, calculations, and a load of other stuff. It's already in quite wide use, and already ticks a number of the buzzwords in the WP500 introduction slide - it does visualisation, it's very VO-friendly (and very well-known by the VO group at ESAC), it's been used to some extent for outreach (though that hasn't been a high priority before now), and I'm looking at adding some data mining capabilities. In its current incarnation it is not scalable up to 1e9 rows (which of course couldn't be reasonably transmitted from an archive server to a client-side tool in any case), so I'm by no means suggesting that it's the single solution to the question that WP500 is seeking to answer, but I do think that a tool of this nature is an important part of the armoury that a user of the Gaia archive will want, and as far as I know, TOPCAT is the most capable one around. STILTS is a complementary suite of command-line tools based on the same technology. Both are implemented in pure java. The web pages of these tools are here: http://www.starlink.ac.uk/topcat/ http://www.starlink.ac.uk/stilts/ I don't have much background with Gaia, and I haven't worked on writing an FP* proposal before now, so I don't have a very clear idea of what's required here. However, I can imagine that once there are requirements for a user-facing tool that can provide the data exploration functionality being discussed here, adding such functionality to an existing powerful and widely-used tool will be a more effective way to tackle it than starting from scratch. One concrete and fairly straightforward possibility that comes to mind is adding a Gaia-specific load dialogue to TOPCAT which makes it easy to interroate the archive to get data into the tool (similar requirements from users of other projects in the past have led to custom load dialogues for VizieR and Millennium Database access services). WP 440 Grand challenges | |||||||
\ No newline at end of file | ||||||||
Added: | ||||||||
> > |
astronomy based on actual Gaia data as worthy contributions to the dissemination of space mission data on a global scale. Participants
| |||||||
\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data exploration | ||||||||
Line: 43 to 43 | ||||||||
Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. WP 430 Data mining tools | ||||||||
Added: | ||||||||
> > | ObjectivesThe objective is to implement the infrastructure to allow common data mining tasks in the Gaia Archive. The focus will be in Knowledge Discovery (new types of objects, exotic objects, similarity-based queries, etc) and modeling. The DM (Data Mining) module will have to scale to the entire Gaia dataset and allow for a flexible definition of the underlying infrastructure (Cloud Computing, GRID computing, and other emerging technologies).Tasks
Input
OutputData Mining capabilities integrated in the Gaia Archive. | |||||||
WP 440 VO tools and services | ||||||||
Changed: | ||||||||
< < | Objectives | |||||||
> > | Objectives | |||||||
The objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data. | ||||||||
Changed: | ||||||||
< < | Tasks | |||||||
> > | Tasks | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data exploration | ||||||||
Line: 43 to 43 | ||||||||
Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. WP 430 Data mining tools | ||||||||
Changed: | ||||||||
< < | WP 440 VO tools | |||||||
> > | WP 440 VO tools and servicesObjectives | |||||||
Changed: | ||||||||
< < | Suggestion from Mark Taylor, TOPCAT developer: | |||||||
> > | The objective is to adapt, test, and implement Virtual Observatory tools and services for GAIA data.
Tasks
Input
OutputGAIA VO tools and services Suggestion from Mark Taylor, TOPCAT developer | |||||||
TOPCAT (which I've developed over about the last 8 years) is a graphical tool for analysis and interactive exploration of tabular data which works well with moderately large datasets (1e6-1e7 rows, 1e2 columns); it does plotting, selections, crossmatching, calculations, and a load of other stuff. It's already in quite wide use, and already ticks a number of the buzzwords in the WP500 introduction slide - it does visualisation, it's very VO-friendly (and very well-known by the VO group at ESAC), it's been used to some extent for outreach (though that hasn't been a high priority before now), and I'm looking at adding some data mining capabilities. In its current incarnation it is not scalable up to 1e9 rows (which of course couldn't be reasonably transmitted from an archive server to a client-side tool in any case), so I'm by no means suggesting that it's the single solution to the question that WP500 is seeking to answer, but I do think that a tool of this nature is an important part of the armoury that a user of the Gaia archive will want, and as far as I know, TOPCAT is the most capable one around. STILTS is a complementary suite of command-line tools based on the same technology. Both are implemented in pure java. The web pages of these tools are here: http://www.starlink.ac.uk/topcat/ http://www.starlink.ac.uk/stilts/ I don't have much background with Gaia, and I haven't worked on writing an FP* proposal before now, so I don't have a very clear idea of what's required here. However, I can imagine that once there are requirements for a user-facing tool that can provide the data exploration functionality being discussed here, adding such functionality to an existing powerful and widely-used tool will be a more effective way to tackle it than starting from scratch. One concrete and fairly straightforward possibility that comes to mind is adding a Gaia-specific load dialogue to TOPCAT which makes it easy to interroate the archive to get data into the tool (similar requirements from users of other projects in the past have led to custom load dialogues for VizieR and Millennium Database access services). WP 440 Grand challenges\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
400 - Tools for data exploration | ||||||||
Line: 43 to 43 | ||||||||
Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. WP 430 Data mining tools | ||||||||
Added: | ||||||||
> > | WP 440 VO toolsSuggestion from Mark Taylor, TOPCAT developer: TOPCAT (which I've developed over about the last 8 years) is agraphical tool for analysis and interactive exploration of tabular data which works well with moderately large datasets (1e6-1e7 rows, 1e2 columns); it does plotting, selections, crossmatching, calculations, and a load of other stuff. It's already in quite wide use, and already ticks a number of the buzzwords in the WP500 introduction slide - it does visualisation, it's very VO-friendly (and very well-known by the VO group at ESAC), it's been used to some extent for outreach (though that hasn't been a high priority before now), and I'm looking at adding some data mining capabilities. In its current incarnation it is not scalable up to 1e9 rows (which of course couldn't be reasonably transmitted from an archive server to a client-side tool in any case), so I'm by no means suggesting that it's the single solution to the question that WP500 is seeking to answer, but I do think that a tool of this nature is an important part of the armoury that a user of the Gaia archive will want, and as far as I know, TOPCAT is the most capable one around. STILTS is a complementary suite of command-line tools based on the same technology. Both are implemented in pure java. The web pages of these tools are here: http://www.starlink.ac.uk/topcat/ http://www.starlink.ac.uk/stilts/ I don't have much background with Gaia, and I haven't worked on writing an FP* proposal before now, so I don't have a very clear idea of what's required here. However, I can imagine that once there are requirements for a user-facing tool that can provide the data exploration functionality being discussed here, adding such functionality to an existing powerful and widely-used tool will be a more effective way to tackle it than starting from scratch. One concrete and fairly straightforward possibility that comes to mind is adding a Gaia-specific load dialogue to TOPCAT which makes it easy to interroate the archive to get data into the tool (similar requirements from users of other projects in the past have led to custom load dialogues for VizieR and Millennium Database access services). | |||||||
WP 440 Grand challenges\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Changed: | ||||||||
< < | 500 - Tools for data exploration | |||||||
> > | 400 - Tools for data exploration | |||||||
Line: 14 to 14 | ||||||||
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. | ||||||||
Changed: | ||||||||
< < | WP 510 Management | |||||||
> > | WP 410 Management | |||||||
Overall management of WP 500 | ||||||||
Changed: | ||||||||
< < | WP 520 Visualization tools | |||||||
> > | WP 420 Visualization tools | |||||||
Inputs provided by A. Moitinho | ||||||||
Line: 42 to 42 | ||||||||
Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has a fundamental importance are the measurement errors. Any tool to be implemented for visual exploitation of Gaia data must take the catalogue errors into account during the visualisation process in a seamless way, if they expect to have some real scientific value. Architecture and functionality of visualisation must be driven by use case scenarios, like those being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual usage in a broad sense. There will always be specific needs in special cases that we cannot predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal tool. Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. | ||||||||
Changed: | ||||||||
< < | WP 530 Data mining toolsWP 540 Grand challenges | |||||||
> > | WP 430 Data mining toolsWP 440 Grand challenges | |||||||
\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
500 - Tools for data exploration | ||||||||
Added: | ||||||||
> > | ||||||||
A use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:
| ||||||||
Line: 11 to 13 | ||||||||
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. \ No newline at end of file | ||||||||
Added: | ||||||||
> > |
WP 510 ManagementOverall management of WP 500WP 520 Visualization toolsInputs provided by A. Moitinho
with usually kilobyte sized explanations. As it is attested by the portion of our brain dedicated to the processing of visual information, the human being has its compreheension favored when the data is presented in a visual way. The aim of scientific visualization is exactly this: to reduce the complexity of scientific data in a way that favor the researcher understanding, and thus the flourishing of ideas and physical interpretation. Gaia data is highly complex in nature, and so will be the Gaia catalogue. Therefore, tools should be provided to the research community for helping them grasping as quick and precisely as possible the information they are searching for, as well as to facilitate and even to encourage serendipitous discoveries. In this way, whatever tool is implemented, it should not work in a complete passive way, waiting for commands from the user, but it should have a little bit of active voice, suggesting some characteristics of the visualization that would facilitate the discovery process. One simple example of an “active visualisation” is the following one: Imagine you want to see the MW in 3D, so you request to visualize the positions x,y,z of all the stars in the catalogue. In this case, an “active tool” would automatically present you a 3D volume rendering of the stars, in a way that you wouldn’t see a 3D scatter plot, with each point representing a single star, but the global structure of the MW would be presented. Then as you zoom in the visualization, the volume render would progressively turn into a scatter plot showing individual stars, obviously in a fully automatic way. Also, this tool would present realistic visualizations. Still using our example of the Galaxy, when seen as an external galaxy a certain amount of degradation in the spacial resolution (psf) is necessary for conferring a realistic spatial representation. The bulk of the stellar population would be visualized as a volume rendering, some specially bright stars would be displayed as PSFs, just like what happens when we observe (even from space) other galaxies. Of course, basic functionalities must be available, such as tools for plotting scattered-points data in 2d or 3d (with additional color-coded and shape-coded dimensions), but even these features should present some kind of “active voice”. For instance, you graphically select a certain amount of stars in a scatter diagram. Automatically you will receive a report with the % of stars of certain types selected selected (within the sample and globally. E.g. x% of the sample is F stars, which are y% of the F stars in the Catalogue. The same for other parameters.) This kind of information would immediately draw attention to any unexpected selection bias, and eventually would lead to knowledge discovery: why the hell to I have so many variable stars here? Another appealing example is to plot unclassified stars and produce “misterious Milky Way” maps. What kind of biases will we find here? This highlights how we must study what kind of representations can provide a broad view of the Gaia catalogue. i.e. seeing a Milky Way map is not a general view of the contents. The design of the visualisation system will rely on the definition of key statistics representing the catalogue contents. Moreover, a rather neglected aspect of 3d visualization softwares that in the case of Gaia has a fundamental importance are the measurement errors. Any tool to be implemented for visual exploitation of Gaia data must take the catalogue errors into account during the visualisation process in a seamless way, if they expect to have some real scientific value. Architecture and functionality of visualisation must be driven by use case scenarios, like those being listed in the GREAT wiki (model comparison, etc). However, we can only know the actual usage in a broad sense. There will always be specific needs in special cases that we cannot predict beforehand. We have to accept this. Gaia visualisation should not claim to be a universal tool. Gaia visualisation should allow interaction with 2D and 3D representations of the Milky Way, allow zooming and paning, selection of data based on positions or any other measurements (color, chemical composition, kinematics, etc). It should be able to represent and allow interaction with both point like data (stars) and extended sources (e.g. molecular clouds mapped via Gaia extinction or measurements from radio surveys). Selection should be possible either directly on the data parameters or with the help of some classification scheme. The tool would also allow fitting or comparing theoretical and semi-empirical models to observations. We don’t really know, or are not used to, do scientific analysis in 3D. The interfaces are not yet comfortable and the interaction approaches are not efficient. This must really be researched. However, 3D displays and interfaces are becoming widespread in the entertainment market. We have to port this experience into scientific visualisation. Why? because we gain an extra dimension to analyse simultaneously. Younger people will certainly be used to these systems. Gaia, and astronomy in general, have a strong appeal to the public. However, scientific plots, although useful to the researcher, do not have visual appeal for the public. To overcome this scientist-public barrier, artist impressions are usually produced but have the inconvenient of being very qualitatively and even misleading due to some exaggeration. The ideal tool should provide some (automatic) cosmetic qualities. WP 530 Data mining toolsWP 540 Grand challenges |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
500 - Tools for data exploration | ||||||||
Line: 11 to 11 | ||||||||
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. | ||||||||
Deleted: | ||||||||
< < |
<----> | |||||||
\ No newline at end of file |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
500 - Tools for data exploration | ||||||||
Line: 8 to 8 | ||||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > |
500 - Tools for data explorationA use of the Gaia archive based on simple queries (i.e. sky region queries) would only allow a basic use of it potential. To fully exploit a billion object dataset, containing a wide variety of data (astrometric, photometric, spectrophotometric, spectroscopic, …) more advanced and powerful data exploration tools will be needed. This work package is devoted to the development of such tools, in close coordination with WP 200 to ensure that they are tailored to the actual needs of the scientific user community. It will include:
Furthermore, the work package also includes the development of some tools for outreach and academic activities. Although not explicitly included in the call, we consider the task of approaching astronomy to the general public and the provision of resources for teaching astronomy based on actual Gaia data is a worthy contribution to dissemination of space mission data on a global scale. <----> |