Usage

bgdata is a Python package with a command line interface. This means that you can use bgdata as a Python library or from a terminal.

Getting packages

The most basic function of bgdata is to retrieve the path to a particular package. This is done through the get method.

The package is identified by a string with the format:

[<project>/]<dataset>/<version>[?<build>|<tag>]
  • project is optional. Default project is _
  • dataset and version are required
  • build or tag are optional. By default, bgdata requests the tag master.

Note

As master is the default tag, it is present in the remote repository, and unless you are in offline mode, bgdata will keep it synchronized.

As an example, we are going to ask for master tag of hg19 version the genomereference dataset in the default project (_).

From the command line:

$ bgdata get _/genomereference/hg19?master
2018-03-19 10:56:08 bgdata.manager INFO -- "master" resolved as 20150724
2018-03-19 10:56:08 bgdata.command INFO -- Dataset downloaded
/home/user/.bgdata/_/genomereference/hg19-20150724

and from Python:

>>> bgdata.get('_/genomereference/hg19?master')
'/home/user/.bgdata/_/genomereference/hg19-20150724'

Important

bgdata returns the path to local or cache folder where the package is present. When there is only one file in the folder, or in some special cases, bgdata returns the path to that file instead of the folder path.

Searching for packages

The bgdata list command can be used to check which data packages are in the local repository. This function (actually it is a generator) returns three elements: a string that represents the package (like the input for the get method), the name of the repo where you can find the package (local represents the local repository, and the rest will be the names of the caches), and the tags associated with that particular build.

In the command line:

$ bgdata list
_/genomereference/hg19?20150724      local   ['master']

From Python:

>>> for pkg, repo, tag in bgdata.list():
...     print('Package {} in {} is associated with tags: {}'.format(pkg, repo, tag))
...
Package _/genomereference/hg19?20150724 in local is associated with tags: ['master']

To search for packages you can use the search command. This command lists all available packages in the indicated level. For example, when searching with empty string, it will list all available projects:

$ bgdata search
_
cgi
intogen

If you search for a project, you get a list of datasets:

$ bgdata search _
genomereference
genomesignature

If you search for a dataset within a project, you get all possible versions:

$ bgdata search _/genomereference
hg19
hg18
hg38

And builds can be find out by searching for the version of the dataset within a project:

$ bgdata search _/genomereference/hg19
20150724

Informartion about the packages

The remote repository contains metadata about the packages. This information is used internally by bgdata to know which projects are presents, which datasets are in each project and so on.

The info command can be used to retrieve that information, by a simple query.

$ bgdata info _/genomereference/hg19
{'author': 'BBGLab',
 'created_on': '20150724,
 'description': 'Human Genome HG19',
 'license': 'Freely available for public use',
 'md5': '851d41ac755f4deba7b98851084927ab'
 'source': 'http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/'}

Logs

The logging process of bgdata is done using the logging module in Python.

When using bgdata as a Python library, the logging module is not configured at all, thus it is left to the end user how to configure the logging system. The loggers used by bgdata are all below one named as bgdata so you only need to configure that one.

When using bgdata from the command line interface, there are two flags that can be used to configure the logging system.

bgdata contains a set of subcommands but there are two flags that are general:

-v, --verbose Give more information
-q, --quiet Suppress all log messages but the ones on the stderr

The --quiet flag can be useful in your bash script to store the output of bgdata in a variable.