Getting Started with the Command Line and Python/R
Along with the Cirro web application, there is also an auxiliary interface that you can use to interact with your data.
The cirro
package can be used either through the command line (as a command-line interface or CLI) or in a Python or R session.
This package can be used to upload, download, and read your datasets into Jupyter Notebooks for performing additional analysis.
Common Tasks
The Cirro client library can be useful for:
- Uploading or downloading large files (> 100MB) that would be slow over the web app
- Transferring files between Cirro and a remote computing cluster
- Automating data ingest or scheduling data analysis
Installation and Set Up
Before using the Cirro CLI, make sure you have installed the latest version of Python (for MacOS, Unix, or Windows) and pip on your computer. Once you have done so, you can install cirro
via PyPI using:
pip install cirro
Upon first use, the Cirro client will ask you what Cirro instance to use and if you would like to save your login information. It will then give you a link to authenticate through the web browser.
If you ever need to change your credentials or Cirro instance after this point, you can clear your saved login information by removing the ~/.cirro/token.dat
file from your system or by running cirro configure
and selecting "No" when it asks if you'd like to save your login information.
Command-Line Interface
After installing the cirro
package, you can easily interact with your data in the command prompt using our command-line interface (CLI). Check out some common use cases with our command line examples.
Software Development Kit
In addition to the command-line interface, Cirro also provides a software development kit (SDK) for commonly-used languages like Python and R. This allows the user to (a) use Cirro as part of a more complex set of operations while also (b) reading data objects from Cirro directly into memory (e.g. as data frames) without having to download any files to disk.
Getting Started
By default, the SDK will use the connection information that you have set up during the CLI initial configuration process.
If you haven't done this yet, please run that first, or provide the base_url
parameter when instantiating the DataPortal class.
from cirro import DataPortal
portal = DataPortal(base_url='app.cirro.bio')
That's it! You're now ready to start accessing the Data Portal directly.
To see more information on all classes & methods available in the SDK, visit our external SDK documentation.
Python Examples
See the following set of Python Jupyter Notebooks that contain examples on the following topics:
Topic | Jupyter Notebook |
---|---|
Uploading data | Uploading a dataset |
Downloading data | Downloading a dataset |
Calling data and reading into tables | Interacting with files |
Run analysis pipeline | Analyzing a dataset |
Managing reference data | Using references |
Pipeline integration | Integrating pipelines |
Advanced usage | Advanced usage |
R Examples
See the following set of R Jupyter Notebooks that contain examples on the following topics:
Topic | Jupyter Notebook |
---|---|
Downloading a dataset in R | Using R |
Filetype Validation
When uploading a dataset, Cirro will perform a check that the files being uploaded meet any requirements set by the dataset type selected. If you try to upload a file and get an error telling you that the files don't meet dataset type requirements, read through the print out of the required files and make any adjustments. You can always include more files, but you must meet all requirements before uploading. Lean more about dataset type requirements in the documentation.
Data Integrity Validation
The integrity of all files uploaded or downloaded using the Cirro client library is ensured via CRC checksum validation via the standard AWS SDK library. Any differences in file content between Cirro and the local system (down to a single byte difference) will result in an error being immediately reported to the user.