Cirro
Command Line and Python/R
Along with the Cirro web application, there is also an auxillary interface that you can use to interact with your data. The cirro
package can be used either through the command line (as a command line interface or CLI) or in a Python or R session. This package can be used to upload, download, and read your datasets into Jupyter Notebooks for performing additional analysis.
Quick Hint: Uploading and Downloading Large Files from the Command Line
# Install the CLI
pip install cirro
# Upload
cirro-cli upload -i
# or Download
cirro-cli download -i
Common Tasks
The Cirro client library can be useful for:
- Uploading or downloading large files (> 100MB) that would be slow over the web app
- Transferring files between Cirro and a remote computing cluster
- Automating data ingest or scheduling data analysis
Filetype Validation
When uploading a dataset, Cirro will perform a check that the files being uploaded meet any requirements set by the dataset type selected. If you try to upload a file and get an error telling you that the files don't meet dataset type requirements, read through the print out of the required files and make any adjustments. You can always include more files, but you must meet all requirements before uploading. Lean more about dataset type requirements in the documentation.
Data Integrity Validation
The integrity of all files uploaded or downloaded using the Cirro client library is ensured via MD5 checksum validation.
Any differences in file content between Cirro and the local system (down to a single byte difference) will result in an error being immediately reported to the user.
While this accounts for any issues arising from network errors, users with additional security requirements can enable SHA-256
hashing following the documentation for the Cirro-client software repository.
Installation and Set Up
You can install cirro
via PyPI using:
pip install cirro
Upon first use, the Cirro client will ask if you would like to save your login information and give you a link to authenticate. Open the link in your web browser and then select your institution and enter your username and password.
If you ever need to change your credentials after this point, you can clear your saved login information by removing the ~/.cirro/token.dat
file from your system or by running cirro-cli configure
and selecting "No" when it asks if you'd like to save your login information.
Command Line Examples
Automated Uploading Datasets
To automate the process of dataset upload, cirro
can be used in non-interactive mode. To use this method, run:
cirro-cli upload [OPTIONS]
with the following options:
[OPTIONS]:
--name TEXT Name of the dataset
--description TEXT Description of the dataset (optional)
--project TEXT Name or ID of the project
--process TEXT Name or ID of the ingest process (indicating the dataset type)
--data-directory TEXT Directory you wish to upload
-i, --interactive Gather arguments interactively
--help Show this message and exit.
Interactive Uploading Datasets
To start the interactive upload process, run:
cirro-cli upload --interactive
You will be prompted to enter the following information. Follow the directions in the command line and hit enter
after each of the following:
- Select the project which the new dataset should be uploaded to using the arrow keys.
E.g. Test Project
- Enter the path to the directory on your filesystem which should be uploaded.
E.g. /shared/biodata/test
- Confirm the number and size of files to upload.
E.g. Yes
- Select the dataset type using the arrow keys.
E.g. RNAseq (FASTQ)
- Enter the dataset name.
E.g. Test Dataset
- Enter the dataset description (optional).
E.g. Test data for practice
After providing all of the required information, the files will be uploaded into the new dataset. Once the upload process is complete, the dataset will become visible in Cirro.
Automated Downloading Datasets
To automate the process of dataset download, cirro
can be used in non-interactive mode. To use this method, run the command:
cirro-cli download [OPTIONS]
with the following options:
[OPTIONS]:
--project TEXT Name or ID of the project
--dataset TEXT ID of the dataset
--data-directory TEXT Directory to store the files
-i, --interactive Gather arguments interactively
--help Show this message and exit.
Interactive Downloading Datasets
To interactively download a dataset (or part of a dataset) from Cirro to your local hard drive, run:
cirro-cli download --interactive
You will be prompted to enter the following information. Follow the directions in the command line and hit enter
after each of the following:
- Select the project which contains the dataset of interest
- Select the dataset to download. Here you can paste your full dataset name and ID and hit
enter
. You can also start typing your dataset name, then pressTAB
and once you see your dataset, you can use the arrows keys to walk through the options and hitenter
to select a dataset. Or pressingTAB
immediately will provide a list of all available options, and you can arrow down to your selection and hitenter
. - Edit the local folder to use as the destination of the download if necessary
- Select if you want to download the entire dataset, only some of the files by choosing from a list of file names, or only some of the files by providing a glob naming pattern (e.g.
*.fastq.gz
)- If you choose to download some of the files by selecting them, use the up/down arrows keys to walk through the list and hit
space
to select a file andspace
again to deselect it. If you want to select all the files, you can hita
. If it is easier, you can also select the files you don't want to download and then hiti
to switch all the selected and unselected files. When you are happy with your selection, hitenter
. - If you choose to download some of the files using a pattern, type the glob naming pattern and hit
enter
. Then you will see a list of the files you've selected and you can choosey
to continue downloading orn
to go back and re-enter your naming pattern.
- If you choose to download some of the files by selecting them, use the up/down arrows keys to walk through the list and hit
All files from the dataset will be downloaded to a folder named data/
in your selected destination.
Listing Datasets in a Project
To list all the datasets available in a project (often used before downloading files), you can run:
cirro-cli list-datasets [OPTIONS]
with the following options:
[OPTIONS]:
--project TEXT ID of the project
-i, --interactive Gather arguments interactively
--help Show this message and exit.
Scripting Languages
In addition to the command-line interface described above, the Cirro client can be used as part of commonly-used languages like Python and R. This allows the user to (a) use Cirro as part of a more complex set of operations while also (b) reading data objects from Cirro directly into memory (e.g. as data frames) without having to download any files to disk.
Python Examples
See the following set of Python Jupyter Notebooks that contain examples on the following topics:
Topic | Jupyter Notebook |
---|---|
Installing and authenticating | Introduction |
Uploading data | Uploading a dataset |
Downloading data | Downloading a dataset |
Calling data and reading into tables | Interacting with a dataset |
Run analysis pipeline | Analyzing a dataset |
Managing reference data | Using references |
R Examples
See the following set of R Jupyter Notebooks that contain examples on the following topics:
Topic | Jupyter Notebook |
---|---|
Downloading a dataset in R | Using R |