seq-to-pheno

Code for longevity : Put in longevity project

Code for depmap : Put in depmap project

Fetching embeddings from sequence : get_embeddings.py

Datasets

add datasets here

Get

get the filtered ortholog dataset :

curl -X GET \
     "https://datasets-server.huggingface.co/first-rows?dataset=seq-to-pheno%2Ffiltered_orthologs&config=default&split=train"

get the mapped ortholog dataset :

curl -X GET \
     -H "Authorization: Bearer $HF_TOKEN" \
     "https://datasets-server.huggingface.co/rows?dataset=seq-to-pheno%2Fmapped_orthologs&config=default&split=train&offset=0&length=100"

Use

from datasets import load_dataset

ds = load_dataset("seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data")

from mlcroissant import Dataset

ds = Dataset(jsonld="https://huggingface.co/api/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data/croissant")
records = ds.records("default")

import pandas as pd

df = pd.read_csv("hf://datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data/protein_sequences_metadata.tsv", sep="\t")

from datasets import load_dataset

mapped = load_dataset("seq-to-pheno/mapped_orthologs")

from datasets import load_dataset

mapped = load_dataset("seq-to-pheno/filtered_orthologs")

Re-Create the filtered Ortholog Dataset:

python ./scripts/filtered_dataset.py --folder /downloads --template_path /seq_to_pheno/hug/zoonomia_dataset_repo_template/README.md --token hf_xxx --max_length 1000 --max_orthologs 20 --publish

Re-Create the Fasta Zoonotica Dataset:

To extract sequences for a specific gene and publish:

python extract_and_publish_protein_sequences.py --input_folder data/zoonomia/ --input_file protein_sequence_df.tsv --output_folder data/zoonomia/ --output_file TP53_protein_sequences.fasta --gene TP53 --publish --repo_name filtered-zoonomia-tp53 --hf_token hf_your_token

To extract all sequences and publish:

python extract_and_publish_protein_sequences.py --input_folder data/zoonomia/ --input_file protein_sequence_df.tsv --output_folder data/zoonomia/ --output_file all_protein_sequences.fasta --publish --repo_name filtered-zoonomia-all --hf_token hf_your_token

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
scripts		scripts
seq_to_pheno		seq_to_pheno
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seq-to-pheno

Datasets

Get

Use

Re-Create the filtered Ortholog Dataset:

Re-Create the Fasta Zoonotica Dataset:

About

Releases

Packages

Contributors 5

Languages

gdolsten/seq-to-pheno

Folders and files

Latest commit

History

Repository files navigation

seq-to-pheno

Datasets

Get

Use

Re-Create the filtered Ortholog Dataset:

Re-Create the Fasta Zoonotica Dataset:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages