Usage¶
bgdata is a Python package with a command line interface. This means that you can use bgdata as a Python library or from a terminal.
Getting packages¶
The most basic function of bgdata is to retrieve the path to a particular package. This is done through the get method.
The package is identified by a string with the format:
[<project>/]<dataset>/<version>[?<build>|<tag>]
- project is optional. Default project is
_
- dataset and version are required
- build or tag are optional. By default, bgdata requests the
tag
master.
Note
As master is the default tag
, it is present in the remote repository,
and unless you are in offline mode, bgdata will keep it synchronized.
As an example, we are going to ask for master tag
of hg19 version
the genomereference dataset
in the default project
(_).
From the command line:
$ bgdata get _/genomereference/hg19?master
2018-03-19 10:56:08 bgdata.manager INFO -- "master" resolved as 20150724
2018-03-19 10:56:08 bgdata.command INFO -- Dataset downloaded
/home/user/.bgdata/_/genomereference/hg19-20150724
and from Python:
>>> bgdata.get('_/genomereference/hg19?master')
'/home/user/.bgdata/_/genomereference/hg19-20150724'
Important
bgdata returns the path to local or cache folder where the package is present. When there is only one file in the folder, or in some special cases, bgdata returns the path to that file instead of the folder path.
Searching for packages¶
The bgdata list command can be used to check which data packages are
in the local repository.
This function (actually it is a generator) returns three elements:
a string that represents the package (like the input for the get method),
the name of the repo where you can find the package (local
represents
the local repository, and the rest will be the names of the caches),
and the tags associated with that particular build.
In the command line:
$ bgdata list
_/genomereference/hg19?20150724 local ['master']
From Python:
>>> for pkg, repo, tag in bgdata.list():
... print('Package {} in {} is associated with tags: {}'.format(pkg, repo, tag))
...
Package _/genomereference/hg19?20150724 in local is associated with tags: ['master']
To search for packages you can use the search command. This command lists all available packages in the indicated level. For example, when searching with empty string, it will list all available projects:
$ bgdata search
_
cgi
intogen
If you search for a project, you get a list of datasets:
$ bgdata search _
genomereference
genomesignature
If you search for a dataset within a project, you get all possible versions:
$ bgdata search _/genomereference
hg19
hg18
hg38
And builds can be find out by searching for the version of the dataset within a project:
$ bgdata search _/genomereference/hg19
20150724
Informartion about the packages¶
The remote repository contains metadata about the packages. This information is used internally by bgdata to know which projects are presents, which datasets are in each project and so on.
The info command can be used to retrieve that information, by a simple query.
$ bgdata info _/genomereference/hg19
{'author': 'BBGLab',
'created_on': '20150724,
'description': 'Human Genome HG19',
'license': 'Freely available for public use',
'md5': '851d41ac755f4deba7b98851084927ab'
'source': 'http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/'}
Logs¶
The logging process of bgdata is done using the logging
module
in Python.
When using bgdata as a Python library,
the logging module is not configured at all, thus it
is left to the end user how to configure the logging system.
The loggers used by bgdata are all below one named as bgdata
so you only need to configure that one.
When using bgdata from the command line interface, there are two flags that can be used to configure the logging system.
bgdata contains a set of subcommands but there are two flags that are general:
-v, --verbose | Give more information |
-q, --quiet | Suppress all log messages but the ones on the stderr |
The --quiet
flag can be useful in your bash script to store
the output of bgdata in a variable.