Repositories¶
bgdata manages the packages through 3 layers of repositories:
- remote
- local
- caches
Remote¶
The remote represents a repository that serves as a source of data packages. Currently, it is an HTTP server that contains the compressed data packages and some tags.
When the user requests for a package that is not present in the local repository bgdata will download it from the remote into the local.
In addition, bgdata will keep in sync the tags
.
This means that if a tag
of a particular package
is updated in the remote, and the user requests that
particular tag
, he or she will get the latest version
from the remote if the local tag
was not up to date.
Note
bgdata can work in offline mode. In such case, packages will not be downloaded and tags will not be updated.
Local¶
The local repository is the one where the user can find the packages that have been requested.
While the remote is an HTTP server, the local should be a reachable path from the user’s machine.
The main difference with the remote repository, apart from being in the local machine, is that packages are uncompressed.
The download process¶
The download process from the remote is done using the Python package homura. Thanks to it, downloads can be resumed. After download, bgdata extracts all the files if they were compressed.
Once the download and extraction processes are done bgdata creates
a file named .download
with the date and time of that moment.
If this file is not present or deleted, bgdata assumes the
download has failed and reattempts it.
Caches¶
A cache is an extension of the local repository. Like the local repository, it should be reachable path from the user’s machine. Moreover, bgdata supports multiple caches.
When the user request a packages, bgdata will be search for it first in each cache and the in the local repository.
A cache can have different uses. As an example, we use the scratch space in the nodes of our cluster to as cache for the packages we use recurrently. For the others, we have a local repository reachable through the network file system.
Important
bgdata will not fail just because a cache is not present. This means that you can also use an external hard drive as a cache and if it is not connected bgdata can still be used.