The data packages¶
A data package is nothing more, and nothing less, that set of files (or even a single file).
Identifying¶
bgdata identifies each data package with a 4-level structure
- project
- dataset
- version
- build
project
and dataset
are the main identifiers of a data package.
In some cases, you might find that the package does not belong to
a particular project. For such cases, we use _
as project name.
Some of the bgdata commands will automatically set the project as
_
if you do not provide it.
The version
is intended distinguish between incompatible
versions of the package. E.g. when you are removing some data columns in your files.
The build
is an identifier that allows to distinguish between compatible versions
of the same packages. Typically, we use the date when we create the package as
the build
identifier. However, the build
can be anything (as long as it does not
start with an alpha character), so you might find other builds.
For example, we use the human genome in many of projects.
There are several version of the human genome available at http://hgdownload.cse.ucsc.edu/downloads.html#human .
We downloaded our data of interest for the hg19 version and created packages
using _
as project, genomereference
as dataset,
hg19
as version and 20150724
as build.
Then, we can request this package as
bgdata get _/genomereference/hg19?20150724
Tags¶
As remembering all the build
identifiers for all the packages might be painful
and you probably need to change all the queries in your scripts to get
newer versions, bgdata supports the concept of tags.
A tag
is a pointer to a particular build, and in several
operations with bgdata you can use a tag
instead of a build
.
bgdata will resolve which is the build
associated with that tag
and use that package.
The advantage of using a tag
rather than a build
is that
with the same query in your software, you get the most updated
version of a particular package by only keeping the tag
up to date.
E.g. following the example above, if we ask for the tag
master
we get the always the most recent version:
bgdata get _/genomereference/hg19?master
provided that we keep our tag
up to date.
In most cases, bgdata will use the tag
master
when you do not indicate the build
or tag
for a particular package.
Important
A tag
works essentially as a pointer to a build
for a particular project
, dataset
and version
.
This means that when asking for a tag
you also need to
indicate the other parameters.