GENIUS Web>200Tailoringtotheendusercommunity (2015-06-23, LolaBalaguer)

WP2 - Tailoring to the end user community

Description

Unlocking the full potential of the Gaia catalogue and archive is not straightforward and will require an ambitious and innovative approach to data publication and access. A key aim of GENIUS is to ensure that the corresponding technical developments are driven by and focused on the scientific needs of the astronomical community that will use the Gaia catalogue. That is, the Gaia catalogue and data archive should be tailored to the needs of the scientific end user, but also the interested amateur or curious member of the general public.

Tailoring should be done by capturing the end user’s scientific requirements and turning those into specifications on the basis of which the Gaia data archive, catalogue and data access methods can be built. This issue has been recognized by the Gaia community and a requirements gathering process amongst the scientific users is currently underway, coordinated by the Gaia archive Preparations group. This process is non-trivial because of the often vague nature of the scientific requirements. It is easy to state that we want to compare a multi-billion particle N-body simulation to the entire Gaia catalogue but how will this be done in practice and what requirements does that set on the way the Gaia data is published and made accessible? In this work package these top level requirements will be analysed with the goal of turning them into detailed requirements. These requirements should be cast in a language that both the scientists and the archive developers understand.

The GAP requirements gathering process has revealed a number of advanced requirements (the Grand Challenges) that go much beyond the normal queries to data archives, and which require research in order to work them out in detail. Implementing these requirements will add very significant value to the Gaia data archive, while the expertise built up in this work package can be employed to enhance the value of other existing or future archives. The requirements for the following Grand Challenges will be researched in this work package:

WP2 - Tailoring to the end user community [Months: 1-42]
Lead beneficiary: UL
Type of activity: RTD

- Confronting complex models with complex data archives (WP-230)
- Seamless data retrieval across archives and wavelength domains (WP-240)
- The living archive (WP-250)
- Re-processing of archived (raw) data (WP-260)

T2.1 - Technical coordination [Months: 1-42]

This work package oversees the work conducted within WP-200. It includes progress tracking and reporting, ensuring that deliverables are ready on time, and taking action in case of delays in the work. The latter action consists of re-assessing the priorities of the efforts spent on the different work packages if needed. The efforts in this work package will feed into developments in the other GENIUS work packages so coordination with the respective work package leaders is also part of this WP. The technical coordination of WP200 will be done by Brown at UL.

T2.2 - Analysis and working out of requirements gathered by GAP [Months: 1-24]

UL, FFCUL, UCAM, KU

Under the auspices of the GREAT network, GAP, and the Gaia science team, the astronomical community was given the opportunity to specify how they might wish to access the Gaia catalogue and data archive. This was done through usage examples in order to get an overview of what the future archive users may want. These data access scenarios (the requirements gathering process and the collected data access scenarios are summarized in [3], available online at http://www.read.esa.int/1link/livelink/open/3125400) need to be turned into precise specifications for the data archive which will serve as input to the activities in the WPs 300/400/500. This task will be undertaken in this work package. As mentioned above the examples provided by the community also revealed a number of advanced usage scenarios requiring a complicated interaction with a substantial fraction of the entire data archive. These will be addressed specifically by WP-230-260-.

In addition to satisfying science user requirements the archive should also be ready to support outreach activities. So part of the work in this WP is to analyze outreach cases and formulate requirements for building outreach facilities into the Gaia archive. This package will be carried out by the personnel hired at UL. The group at FFCUL will contribute 2 staff months of effort to provide their expertise for the analysis of the user requirements related to visualization aspects. The group at KU will contribute 2 staff months to a collaborative effort of conducting a requirements gathering and analysis exercise in the context of the Japanese Nano-JASMINE mission. In particular the requirements on providing a combined Nano-JASMINE/ Hipparcos catalogue (improved proper motions) will be investigated. The KU group will benefit from the GAP experience and in turn we expect that lessons learned from the requirements analysis for Nano-JASMINE can also be applied to the Gaia case. The UCAM group will devote 2 staff months to the organisation of the update of the requirements from the GREAT community. In addition the UCAM will contribute 2 staff months to the analysis of user requirements specific for ‘science alerts’.

T2.3 - Confronting complex models with complex catalogues [Months: 1-42]

Modern astronomical surveys offer the possibility of testing our understanding of the universe against vast data sets collected over the entire sky. In particular the Gaia catalogue will be highly constraining for models of the Milky Way or of the properties of stars. The models must explain the data collected across all stellar populations over a large fraction of the volume of our Galaxy. Testing stellar evolution models against single clusters or Galaxy models against star counts along a single line of sight will no longer be sufficient. These tests will have to be made against the entire catalogue in order to extract the maximum scientific return. Such an undertaking is very difficult because of the large amount of data involved, the large range in observational errors (due to the survey depth), the correlations between errors on the different quantities and between sources, and the often non-linear relation between the measured quantities and the natural model parameters (for instance parallax is measured rather than distance). It has therefore been argued over the recent years (see e.g., [2, 7]) that the only truly robust way to deal with this challenge is to project models into the data space (i.e., use ‘forward modelling’) and thus predict the catalogue data. A good model will thus provide the correct ‘predicted catalogue’.To facilitate (and encourage) such a forward modelling approach we want to provide the corresponding tools on the data archive side. The following concepts will be worked out and turned into detailed requirements:

Provide tools to project models into the catalogue’s data space. For example, turn a Galaxy model into predicted astrometry, radial velocities, stellar population properties (ages, metallicities), or turn synthetic spectra from stellar models into predicted photometric measurements. The tools should encapsulate our knowledge of the instruments that produced the catalogue. This effort can build on the substantial instrument modelling expertise built up within Coordination Unit 2 of DPAC.
Provide tools for comparing the predicted and the observed catalogue or data. The comparison will likely be done in a Bayesian framework so the following could be foreseen: a likelihood generator that is aware of the catalogue’s error properties, including correlations; tools for specifying priors; posterior likelihood optimizers. Users should also be able to contribute their own optimization tools. The forward modelling facilities will also be very valuable in the context of the data validation approach taken in WP-530. This package will be carried out by the personnel hired at UL

T2.4 - Seamless data retrieval across archives and wavelength domains [Months: 1-24]

INAF

Although the Gaia catalogue on its own will be a very powerful tool, it is the combination of this high accuracy archive (especially the astrometry) with other archives that will truly open up amazing possibilities for astronomical research. An example application would be to query the Gaia catalogue for sources brighter and fainter than the survey limit of Gaia, where behind the scenes the work is done to combine Gaia and other sky surveys. In this way our reach across the Galaxy can be extended by combining the greater depth of surveys like LSST, Pan-Starss, SDSS, and EUCLID, with very accurately calibrated photometric distance indicators. The latter will be one of the Gaia results. Another example is the combination of accurate stellar distances, and extinction measurements with data on the gas and dust in the Milky Way’s interstellar medium in order to build up a 3D picture of the ISM. In addition data on the velocity of the gas will enable us to constrain the gravitational potential in which the gas moves, and through combination with the stellar phase space data much more tightly constrain the Galaxy’s mass density. Many other examples can be provided but the point here is that the advanced inter-operation of archives does not simply mean ‘cross-matching’ but providing truly seamless data retrieval, leaving the user with the feeling of working with one single data archive. The data retrieval should work not only across data archives but also across wavelength domains as illustrated with the ISM example above. This WP can possibly build on developments that have already taken place in the context of the Virtual Observatory and the resulting requirements will feed into WP-330 and WP-440 and will also benefit the efforts planned for WP-540.

This package will be carried out by the person hired by INAF. The management of this WP and the coordination between INAF and UL will be done by Smart of INAF-OATo (2 staff months), while Spagna, also at INAF-OATo, will contribute his expertise on cross-matching (2 staff months).

T2.5 - The living archive [Months: 1-42]

A concept closely related to the previous item is that of making the Gaia data archive a ‘living entity’. By this we mean that it should be possible to incorporate new information into the archive. Examples are complementary groundbased spectroscopy, updated classifications or parametrizations of stars based on independent information, better distance estimates for faint stars (e.g., photometric distance indicators calibrated on stars with accurate parallaxes), etc. The seamless integration with archives from other large sky surveys forms a natural part of the living archive idea.

The questions to investigate here are: how do we incorporate new information into the Gaia archive in a controlled manner? This means vetting of the new information, tracing the history of the information related to a source as well as the history of source classifications and parametrizations, and making the new information available in a transparent manner. This package will be carried out by the personnel hired at UL.

T2.6 - Re-processing of archived (raw) data [Months: 1-42]

The Hipparcos Catalogue publication included the so-called intermediate astrometric data. The intermediate data are residuals of the observables with respect to the primary astrometric solution and the derivatives of these observables with respect to the astrometric parameters. These data allow users to re-process the Hipparcos astrometric data, notably to improve the astrometry of binaries and very red giant stars. Re-processing of already published data is gaining increasing popularity (as illustrated by the reprocessing of SDSS multi-epoch data described in [9]) and allows for much extending the scientific value and lifetime of existing data archives. Examples of re-processing that could be foreseen for the Gaia data archive are: the re-processing of intermediate data for groups of stars in order to derive a common radial velocity or parallax, the re-processing of data for objects that are discovered or confirmed to be binaries following a data release, or the re-determination of astrophysical parameters for stars following future improvements in stellar atmosphere modelling.

On a more ambitious level the study described in [18] built on improved insights into the attitude modelling for the Hipparcos spacecraft to perform a re-processing of the entire Hipparcos data set. The resulting new version of the Hipparcos catalogue features very much reduced error correlations and improved astrometric accuracies (by up to a factor of 4) for the bright stars. In principle also for Gaia the re-processing of all the raw data might be warranted at some point in the future.

The research questions underpinning the requirements specification in this case are:

How do we archive the raw and intermediate data products for long term usability? This includes calibration

methods and their parameters as well as the original processing software.

How do we present, communicate, and facilitate the use of intermediate data or raw data?

This package will be carried out by the personnel hired at UL.

Participants

Manager: A. Brown (Leiden)
Partners:
- UL: Gráinne Costigan, Arkadiusz Hypki
- INAF: Richard Smart, Alessandro Spagn, Robert Butora
- FFCUL: André Moitinho, Alberto Krone-Martins
- UCAM: Nicholas Walton, Floor Van Leeuwen, Simon Hodgkin, Guy Rixon
- KU: Yoshiyuki Yamada, Naoteru Gouda, Ryoichi Nishi, Shunsuke Hozumi, Satoshi Yoshioka

The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7-SPACE-2013-1) under grant agreement n°606740.