Skip to content

Downloading a dataset

Downloading a dataset

from cirro import DataPortal

portal = DataPortal()

You can get the list of all projects which are available, and select a particular project by name

print(f"There are {len(portal.list_projects()):,} projects available")
# print(portal.list_projects()) # run this line to see all the projects

project = portal.get_project_by_name("Test Project")
print(f"Selected the project '{project.name}' (ID: {project.id})")
print(f"This project contains {len(project.list_datasets()):,} datasets to choose from")
There are 3 projects available
Selected the project 'Test Project' (ID: 9a31492a-e679-43ce-9f06-d84213c8f7f7)
This project contains 104 datasets to choose from

Select a single dataset from that project

# Datasets can be selected by name or by ID
dataset = project.get_dataset_by_id("bcda3e84-1abe-4d08-86b0-690ea7e1cdad")
# dataset = project.get_dataset_by_name("Test of mageck-count")
print(dataset)
Name: Test of mageck-count (updated headnode code 9/22/2022) (3)
Id: bcda3e84-1abe-4d08-86b0-690ea7e1cdad
Description: Test of mageck-count (updated headnode code 9/22/2022)
Status: COMPLETED

Download all of the files from that dataset to a temporary folder

dataset.download_files("/tmp")
Downloading file MO_Brunello_1.fastq (898.44 KB) | 100.0%|█████████████████████████ | 1.46MB/s
Downloading file MO_Brunello_2.fastq (898.44 KB) | 100.0%|█████████████████████████ | 1.83MB/s
Downloading file MO_Brunello_gDNA_1.fastq (898.44 KB) | 100.0%|█████████████████████████ | 2.16MB/s
Downloading file MO_Brunello_gDNA_2.fastq (898.44 KB) | 100.0%|█████████████████████████ | 1.39MB/s
Downloading file multiqc_report.html (1.12 MB) | 100.0%|█████████████████████████ | 1.35MB/s
Downloading file MO_Brunello_1.json (72.07 KB) | 100.0%|█████████████████████████ | 285kB/s
Downloading file MO_Brunello_1_fastqc.html (804.22 KB) | 100.0%|█████████████████████████ | 1.15MB/s
Downloading file MO_Brunello_2.json (72.07 KB) | 100.0%|█████████████████████████ | 349kB/s
Downloading file MO_Brunello_2_fastqc.html (824.26 KB) | 100.0%|█████████████████████████ | 1.19MB/s
Downloading file MO_Brunello_gDNA_1.json (72.53 KB) | 100.0%|█████████████████████████ | 319kB/s
Downloading file MO_Brunello_gDNA_1_fastqc.html (824.76 KB) | 100.0%|█████████████████████████ | 2.10MB/s
Downloading file MO_Brunello_gDNA_2.json (71.84 KB) | 100.0%|█████████████████████████ | 289kB/s
Downloading file MO_Brunello_gDNA_2_fastqc.html (815.26 KB) | 100.0%|█████████████████████████ | 1.95MB/s
Downloading file MO_Brunello_1.count.txt (1.55 MB) | 100.0%|█████████████████████████ | 3.62MB/s
Downloading file MO_Brunello_1.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 3.09MB/s
Downloading file MO_Brunello_1.countsummary.txt (237.00 B) | 100.0%|█████████████████████████ | 1.42kB/s
Downloading file MO_Brunello_2.count.txt (1.55 MB) | 100.0%|█████████████████████████ | 3.61MB/s
Downloading file MO_Brunello_2.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 2.72MB/s
Downloading file MO_Brunello_2.countsummary.txt (237.00 B) | 100.0%|█████████████████████████ | 2.28kB/s
Downloading file MO_Brunello_gDNA_1.count.txt (1.55 MB) | 100.0%|█████████████████████████ | 2.82MB/s
Downloading file MO_Brunello_gDNA_1.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 2.57MB/s
Downloading file MO_Brunello_gDNA_1.countsummary.txt (247.00 B) | 100.0%|█████████████████████████ | 2.57kB/s
Downloading file MO_Brunello_gDNA_2.count.txt (1.55 MB) | 100.0%|█████████████████████████ | 3.40MB/s
Downloading file MO_Brunello_gDNA_2.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 1.52MB/s
Downloading file MO_Brunello_gDNA_2.countsummary.txt (246.00 B) | 100.0%|█████████████████████████ | 2.33kB/s
Downloading file counts.txt (1.99 MB) | 100.0%|█████████████████████████ | 3.48MB/s
Downloading file sample_names.txt (65.00 B) | 100.0%|█████████████████████████ | 662B/s
Downloading file summary.txt (366.00 B) | 100.0%|█████████████████████████ | 2.41kB/s
Downloading file MO_Brunello_1.log (2.39 KB) | 100.0%|█████████████████████████ | 11.1kB/s
Downloading file MO_Brunello_2.log (2.39 KB) | 100.0%|█████████████████████████ | 16.1kB/s
Downloading file MO_Brunello_gDNA_1.log (2.43 KB) | 100.0%|█████████████████████████ | 23.2kB/s
Downloading file MO_Brunello_gDNA_2.log (2.43 KB) | 100.0%|█████████████████████████ | 19.4kB/s

Alternatively, you can inspect and filter the list of files to only what is needed

files = dataset.list_files()
print(files)
data/cutadapt/trim/fastq/MO_Brunello_1.fastq (920000 bytes)

data/cutadapt/trim/fastq/MO_Brunello_2.fastq (920000 bytes)

data/cutadapt/trim/fastq/MO_Brunello_gDNA_1.fastq (920000 bytes)

data/cutadapt/trim/fastq/MO_Brunello_gDNA_2.fastq (920000 bytes)

data/fastqc/multiqc_report.html (1173155 bytes)

data/fastqc/MO_Brunello_1/MO_Brunello_1.json (73803 bytes)

data/fastqc/MO_Brunello_1/MO_Brunello_1_fastqc.html (823526 bytes)

data/fastqc/MO_Brunello_2/MO_Brunello_2.json (73797 bytes)

data/fastqc/MO_Brunello_2/MO_Brunello_2_fastqc.html (844044 bytes)

data/fastqc/MO_Brunello_gDNA_1/MO_Brunello_gDNA_1.json (74268 bytes)

data/fastqc/MO_Brunello_gDNA_1/MO_Brunello_gDNA_1_fastqc.html (844554 bytes)

data/fastqc/MO_Brunello_gDNA_2/MO_Brunello_gDNA_2.json (73563 bytes)

data/fastqc/MO_Brunello_gDNA_2/MO_Brunello_gDNA_2_fastqc.html (834827 bytes)

data/mageck/count/MO_Brunello_1.count.txt (1625955 bytes)

data/mageck/count/MO_Brunello_1.count_normalized.txt (1638475 bytes)

data/mageck/count/MO_Brunello_1.countsummary.txt (237 bytes)

data/mageck/count/MO_Brunello_2.count.txt (1625955 bytes)

data/mageck/count/MO_Brunello_2.count_normalized.txt (1638372 bytes)

data/mageck/count/MO_Brunello_2.countsummary.txt (237 bytes)

data/mageck/count/MO_Brunello_gDNA_1.count.txt (1625960 bytes)

data/mageck/count/MO_Brunello_gDNA_1.count_normalized.txt (1638522 bytes)

data/mageck/count/MO_Brunello_gDNA_1.countsummary.txt (247 bytes)

data/mageck/count/MO_Brunello_gDNA_2.count.txt (1625960 bytes)

data/mageck/count/MO_Brunello_gDNA_2.count_normalized.txt (1638905 bytes)

data/mageck/count/MO_Brunello_gDNA_2.countsummary.txt (246 bytes)

data/mageck/count/combined/counts.txt (2090653 bytes)

data/mageck/count/combined/sample_names.txt (65 bytes)

data/mageck/count/combined/summary.txt (366 bytes)

data/mageck/count/log/MO_Brunello_1.log (2449 bytes)

data/mageck/count/log/MO_Brunello_2.log (2449 bytes)

data/mageck/count/log/MO_Brunello_gDNA_1.log (2489 bytes)

data/mageck/count/log/MO_Brunello_gDNA_2.log (2488 bytes)
norm_counts = files.filter_by_pattern("*.count_normalized.txt")
print(norm_counts)
data/mageck/count/MO_Brunello_1.count_normalized.txt (1638475 bytes)

data/mageck/count/MO_Brunello_2.count_normalized.txt (1638372 bytes)

data/mageck/count/MO_Brunello_gDNA_1.count_normalized.txt (1638522 bytes)

data/mageck/count/MO_Brunello_gDNA_2.count_normalized.txt (1638905 bytes)
norm_counts.download("/tmp")
Downloading file MO_Brunello_1.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 1.86MB/s
Downloading file MO_Brunello_2.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 3.78MB/s
Downloading file MO_Brunello_gDNA_1.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 2.86MB/s
Downloading file MO_Brunello_gDNA_2.count_normalized.txt (1.56 MB) | 100.0%|█████████████████████████ | 3.27MB/s