Gaia EDR3 bulk catalogue available in a better compression format
The Gaia group at the University of Barcelona (IEEC – ICCUB), in cooperation with DAPCOM Data Services (a technological spin-off company of the UPC and the UB), has published an alternative copy of the bulk data files from Gaia EDR3 – the Early Data Release 3 from Gaia.
Gaia EDR3 was published yesterday, 3rd December 2020. Besides the on-line catalogue, bulk CSV files were also made available for download – an interesting option for exhaustive analyses. Such files are officially offered in “csv.gz” format, that is, compressed with the widely known gzip compressor.
DAPCOM provides FAPEC, a professional data compression software offering high compression ratios at high speeds. One of the options provided is the compression of tabular (CSV-like) text files, such as those from the bulk Gaia EDR3. As a service to the worldwide astronomical community, and also as a demonstration of the capabilities of FAPEC, DAPCOM and the Gaia IEEC/ICCUB Group converted the GaiaSource files from the official Gaia EDR3 bulk CSV repository into the FAPEC format, reducing the total size from 613 GB to 495 GB – that is, 19% smaller than with gzip. Other data compressors like bzip2, rar, Zstandard or 7-zip cannot reach this mark.
You can now download Gaia EDR3 in csv.fapec format here:
The additional tables available in the bulk Gaia EDR3 catalogue will also be converted and published during the coming days.
Free FAPEC decompression licenses can be obtained from the DAPCOM website. DAPCOM plans to make a new FAPEC release soon, including a freely downloadable decompressor with Python bindings.