Configuring bgdata

bgdata has a default configuration file which looks like:

version=2
local_repository = "~/.bgdata"
remote_repository = "http://bbglab.irbbarcelona.org/bgdata"

However, you can create you own configuration file and change it.

Custom configuration

To create you own custom configuration you need to create a file bgdatav2.conf and place in the corresponding config file folder (this is done using the appdir package using the user_config_dir function with bbglab as the only parameter).

That file, should follow the same structure as the default, but you can change the sections to fit you own needs.


The local folder (where the data packages are stored) is indicated through local_repository.

# The default local folder where you want to store the data packages
local_repository = "~/.bgdata"

Note

You can put any reachable path.


The remote repository is a (public) URL where the data packages are stored and the bgdata uses to look for the packages that are not in the local repository.

# The remote URL from where do you want to download the data packages
remote_repository = "http://bbglab.irbbarcelona.org/bgdata"

If you need to access to the remote repo through a proxy you can also configure it as follows:

# Optional proxy configuration
# [proxy]
host = proxy.someurl.org
port = 8080

# If it's an authenticated proxy
user = myname
pass = mypasswd

Optionally, bgdata can be set to not look for newer versions of the packages in the remote repository and only use what is available on the local. To make use of this option, you need to add:

# If you want to force bgdata to work only locally
offline = True

Using the cache_repositories option you can indicate a list of repositories (similar to the local) in which to look for the files.

# Cache repositories
[cache_repositories]
# Pairs name and path
my hard drive = /mnt/user/hd

Note

cache repositories have higher priority than the local, meaning that bgdata will look in them before checking the local. In addition, they are search last to first.

As an example of usage, data packages that are being used recurrently in our cluster are saved in the scratch directory of each node. This way, bgdata takes the data from the scratch which is faster than using the network file system.