This page is automatically generated via the readme.
Check out the usage section for further information, including how to install the project.
Check out the API if you're keen to go deep.
Note
This project is under active development.
BAS Download Toolbox
This is a python library providing CLI operations allowing users to download common environmental datasets for use in data pipelines. We use this within our optimisation and machine learning pipelines within BAS and it should be flexible enough to adapt to many different use cases.
Contact digitalinnovation <at> bas <dot> ac <dot> uk if you want further information.
Table of contents
Installation
pip install download-toolbox
Please refer to the contribution guidelines for more information.
Implementation
When installed, the library will provide a series of CLI commands. Please use
the --help switch for more initial information, or the documentation.
Basic principles
The library sets up downloaders that will go through the following steps, for a variety of different data sources:
- Set up a data store or if it exists, read the provenance config
- Naively optimise the requested download
- Download from the source in parallel
- Transform the dataset into convenient to use files, ready for processing
That last step is important, as it might result in a different dataset to that which comes from source. The tool is intended to record this in the provenenace configuration, which is why it might exist in step (1), so that new data downloaded is consistent with what's there - as well as the differences from the source data recorded for consistency (you should not be able to screw up existing datasets), posterity and reproducibility.
Limitations
There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! They likely don't have issues related, yet.
- Works only for hemisphere level downloading - north or south. The overhaul for this intends to ensure that identifiers are used so that someone can specify "north" or "south" but equally specify "Norway" or "The Shops" and then provide a geolocation that would identify the dataset within the filesystem.
This is currently in development, but the following downloaders do work well:
- download_amsr2
- download_aws
- download_cmip
- download_era5
- download_mars
- download_oras5
- download_osisaf
Contributing
Please refer to the contribution guidelines for more information.
Credits
License
This is licensed using the MIT License.