.. Pyllelic documentation master file, created by
sphinx-quickstart on Sat Feb 20 10:57:56 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Pyllelic's documentation!
====================================
.. raw:: html
`pyllelic `__: a tool for detection of allelic-specific methylation variation in bisulfite DNA sequencing files.
.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:
index
Quickstart
~~~~~~~~~~
Run an interactive sample pyllelic environment in your web browser using `mybinder.org `__:
.. raw:: html
Demo gif
~~~~~~~~
.. raw:: html
Dependencies and Installation
=============================
Source code for pyllelic is available on `Github `__
Using Conda (preferred)
~~~~~~~~~~~~~~~~~~~~~~~
Create a new conda environment using python 3.8:
Easiest:
.. code:: bash
# Get environment.yml file from this repo
curl -L https://github.com/Paradoxdruid/pyllelic/blob/master/environment.yml?raw=true > env.yml
# Create and activate conda environment
conda env create --file=env.yml
conda activate pyllelic
.. raw:: html
.. raw:: html
or more explict step by step instructions
.. raw:: html
.. code:: bash
conda create --name pyllelic python=3.8
conda activate pyllelic
conda config --env --add channels conda-forge
conda config --env --add channels bioconda
conda config --env --add channels paradoxdruid
conda install pyllelic
# Optional but usual use case:
conda install notebook jupyter_contrib_nbextensions ipywidgets
.. raw:: html
Docker container
~~~~~~~~~~~~~~~~
.. code:: bash
docker pull ghcr.io/paradoxdruid/pyllelic:latest
PyPi installation
~~~~~~~~~~~~~~~~~
.. raw:: html
.. raw:: html
PyPi instructions
.. raw:: html
This will require independent installation of samtools, bowtie2, and
bismark packages.
.. code:: bash
# PyPi
python3 -m pip install pyllelic
# or Github
python3 -m pip install git+https://github.com/Paradoxdruid/pyllelic.git
.. raw:: html
Quickstart
==========
Example exploratory use in jupyter notebook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set up files:
.. code:: python
from pyllelic import process
from pathlib import Path
# Retrieve promoter genomic sequence of region to analyze
process.retrieve_seq("tert_genome.txt", chrom="chr5", start=1293000, end=1296000)
# Download a reference genome and bisulfite sequencing data
# Genome data from, e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg19
# Fastq data from, e.g. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethylRrbs/
genome = Path("/{your_directory}/{genome_file_directory}")
fastq = Path("/{your_directory}/{your_fastq_file.fastq.gz}")
# Use bismark tool to prepare bisulfite genome and align fastq to bam file
process.prepare_genome(genome) # can optionally give path to bowtie2 if not in PATH
process.bismark(genome, fastq)
# Sort and index the resultant bam file
bamfile = Path("/{your_directory}/{bam_filename}.bam")
process.sort_bam(bamfile)
process.index_bam(bamfile.parent / f"{bamfile.stem}_sorted.bam")
Run pyllelic:
.. code:: python
from pyllelic import pyllelic
config = pyllelic.configure( # Specify file and directory locations
base_path="/home/jovyan/assets/",
prom_file="tert_genome.txt",
prom_start="1293200",
prom_end="1296000",
chrom="5",
offset=1293000, # start position of retrieved promoter sequence
# viz_backend="plotly",
# fname_pattern=r"^[a-zA-Z]+_([a-zA-Z0-9]+)_.+bam$",
# test_dir="test",
# results_dir="results",
)
files_set = pyllelic.make_list_of_bam_files(config) # finds bam files
# Run pyllelic; make take some time depending on number of bam files
data = pyllelic.pyllelic(config=config, files_set=files_set)
positions = data.positions
cell_types = data.cell_types
means_df = data.means # mean methylation of reads
modes_df = data.modes # mode methylation of reads
diff_df = data.diffs # difference mean - mode of reads
individual_data = data.individual_data # read methylation values
data.save("output.xlsx") # save methylation results
data.save_pickle("my_run.pickle") # save data object for later analysis
data.write_means_modes_diffs(filename="Run1_") # write output data files
data.histogram("CELL_LINE", "POSITION") # visualize data for a point
data.heatmap(min_values=1) # methylation level heatmap
data.reads_graph() # individual methylated / unmethylated reads graph
data.quma_results["CELL_LINE"] # see summary data for a cell line
--------------
Function Reference
==================
.. toctree::
:maxdepth: 2
pyllelic
Authors
=======
This software is developed as academic software by `Dr. Andrew J.
Bonham `__ at the `Metropolitan State
University of Denver `__. It is licensed
under the GPL v3.0.
This software incorporates implementation from
`QUMA `__, licensed under the GPL v3.0.