Skip to content

Interacting with files

Interacting with files

from cirro import DataPortal

portal = DataPortal()

Find the file you are looking for by defining the project and dataset, then searching for a particular file of interest based on a pattern using filter_by_pattern

# Get the project which contains the dataset
project = portal.get_project_by_name('Test Project')

# Get the set of datasets within that project
all_datasets = project.list_datasets()
print(f"The project {} contains {len(all_datasets):,} datasets")

# Get the dataset of interest based on its name
dataset = all_datasets.get_by_name('Test of mageck-count')

# Get the complete list of files in that dataset
files = dataset.list_files()
print(f"Dataset {} contains {len(files):,} files")

# Filter to just the files named counts.txt (using the wildcard to match the string of folders it is in)
counts = files.filter_by_pattern("*/counts.txt")

print(f"Selected the file: {counts.description()}")
The project Test Project contains 104 datasets
Dataset Test of mageck-count contains 32 files
Selected the file: data/mageck/count/combined/counts.txt (2090653 bytes)

Load the contents of that file into a DataFrame (keeping in mind that it is tab-delimited, not the default comma-delimited)

df = counts[0].read_csv(sep="\t")
sgRNA Gene MO_Brunello_gDNA_2 MO_Brunello_1 MO_Brunello_2 MO_Brunello_gDNA_1
0 A1BG_0 A1BG 0 0 0 0
1 A1BG_1 A1BG 0 0 0 2
2 A1BG_2 A1BG 0 0 0 0
3 A1BG_3 A1BG 0 0 2 0
4 A1CF_36946 A1CF 0 0 0 0