Appendix
Contents
Appendix¶
Project folder structure guidance¶
The guidances notes provide an overview of the folder structure used in this handbook
All projects live in an S3 bucket
The directory structure is always
<bucket-name>/projects/<project_name>In the
<project_name>folder there are sub directoriesThe first subdirectory is
workspaceOther subdirectories are batches of data
The batches of data are labeled by date and include
imagesandillumfoldersIn the
imagesfolder there exist different plates storing raw image dataThe
illumfolder is identical to theimagesfolder in terms of structureillumis an output of the first stage of cell profiler pipeline that stores a function to adjust the plates inimages
workspacealso has subdirectoriesanalysis- includes subfolders mirroring theBatchnestingWithin each
batchfolder, the CellProfiler results are stored inplate_idWithin each
plate folderthere is ananalysisfolderInside this
analysisfolder, each well has its own folder (e.g.A01-1)Aand01refer to the row and column of the plate, 1 refer to sites per wellIf the grouping was done by well instead of by site, this would be
A01, without the suffix of-1
Note that this
analysisfolder is customizableThere are typically 384 (# of wells) x 9 (# of sites per well) subfolders
384 well plate
9 different pictures
Within the site folder (e.g.
A01-1) there are five csv filesCells.csvEach row are measurements of one cell
Cytoplasm.csvAnother object similar to Cells.csv
Nuclei.csvAnother object similar to Cells.csv
These three object files can be concatenated by column
Objects.csv
Experiment.csvStores metadata for the CellProfiler run, including the CellProfiler pipeline itself
Image.csv
backend- also includesbatchnestingbatchnestingplatenesting - stores summaries of each plate (all .csv files also have .gct formats (for input into Morpheus)<plate_id>.sqlite- inner join of all objects in a well, and then stacked (so all data for each well in a single plate)<plate_id>.csv- per well means for each well on the plate<plate_id>.augmented.csv- same as .csv except it includes the metadata<plate_id>._normalized.csv- some z scored version of augmented<plate_id>._normalized_variable_selected.csv- across all the plates in the batchThree feature selection steps
Variance threshold
Correlation threshold (decorrelate feature set)
Replicate correlation filter (>0.6)
parameters- same structure asbackendbut with metadata results (e.g. the features selected in variable selection)softwareThis is where the project’s github repository lives.
The scripts in the handbook assume that this be named as the same name as the Project folder. To rename it, pay careful attention to paths when executing the commands in the handbook.
Directory structure¶
├── 2016_04_01_a549_48hr_batch1
│ ├── illum
│ │ └── SQ00015167
│ │ ├── SQ00015167_IllumAGP.mat
│ │ ├── SQ00015167_IllumDNA.mat
│ │ ├── SQ00015167_IllumER.mat
│ │ ├── SQ00015167_IllumMito.mat
│ │ ├── SQ00015167_IllumRNA.mat
│ │ └── SQ00015167.stderr
│ └── images
│ └── SQ00015167__2016-04-21T03_34_00-Measurement1
│ ├── Assaylayout
│ ├── FFC_Profile
│ └── Images
│ ├── r01c01f01p01-ch1sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch2sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch3sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch4sk1fk1fl1.tiff
│ └── r01c01f01p01-ch5sk1fk1fl1.tiff
└── workspace
├── audit
│ └── 2016_04_01_a549_48hr_batch1
├── analysis
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ └── analysis
│ └── A01-1
│ ├── Cells.csv
│ ├── Cytoplasm.csv
│ ├── Experiment.csv
│ ├── Image.csv
│ ├── Nuclei.csv
│ └── outlines
│ └── SQ00015167
│ ├── A01_s1--cell_outlines.png
│ └── A01_s1--nuclei_outlines.png
├── backend
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ ├── SQ00015167.csv
│ └── SQ00015167.sqlite
├── images
│ └── 2016_04_01_a549_48hr_batch1 -> /home/ubuntu/bucket/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/2016_04_01_a549_48hr_batch1/images/
├── load_data_csv
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ ├── load_data.csv
│ └── load_data_with_illum.csv
├── log
│ ├── create_csv_from_xml
│ └── collate
├── metadata
│ └── 2016_04_01_a549_48hr_batch1
│ ├── barcode_platemap.csv
│ └── platemap
│ └── C-7161-01-LM6-006.txt
├── pipelines
├── status
└── software
├── Distributed-CellProfiler
└── pe2loaddata