Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producing dual assemblies when no trio or Hi-C data is available #157

Open
smkumaill opened this issue Jun 16, 2023 · 7 comments
Open

Producing dual assemblies when no trio or Hi-C data is available #157

smkumaill opened this issue Jun 16, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@smkumaill
Copy link

When there is no trio data available, is it possible to produce hifiasm style pseudo-haplotype resolved assemblies?

I was not able to produce the pseudo-haplotypes from the final assembly. Do you have any suggestion as to what tool or method could be utilized to produce these pseudo-haplotypes (if at all possible).

@skoren
Copy link
Member

skoren commented Jun 16, 2023

There is currently no unphased output from verkko. However, the combination of hifi and ONT data produces much longer phase blocks than hifi alone (e.g. megabases vs kps on human). The size of the blocks will vary depending on the heterozygosity of the sample. I would suggest looking at the output graph to see how much connectivity remains in your sample due to large homozygous regions. If there are almost none, you can do a simple purge to get a single haplotype on the final assembly (using purge_dups). If not, the only current option would be to provide paths manually through the graph.

@skoren skoren added the enhancement New feature or request label Jun 23, 2023
@skoren
Copy link
Member

skoren commented Jun 23, 2023

We've discussed producing a primary/alt style output so I've added the enhancement tag and will keep this open as part of future development.

@kentaurse
Copy link

@skoren
With a solid background in genome assembly and data analysis, I can support the development of dual assemblies for scenarios without trio or Hi-C data. I’m experienced in leveraging HiFi and ONT data to optimize phase blocks and address heterozygosity challenges, ensuring accurate pseudo-haplotype resolved assemblies. Additionally, I am adept at employing tools like purge_dups to refine single haplotypes and analyzing connectivity in assembly graphs. I’m excited about the opportunity to contribute to enhancing Verkko's capabilities, making it versatile and effective in diverse genomic assembly scenarios.

@skoren
Copy link
Member

skoren commented Nov 7, 2024

Hi @kentaurse, thanks for offering to help. Given the availability of the diploid graph structure I don't think you'd need to add the dependency on purge_dups.

The current Hi-C pipeline has homology detection (based on mashmap3) plus separation in heterozygous and homozygous nodes. Essentially, you'd need to add something that uses a random partitioning instead of the Hi-C signal to make walks. If you're willing to work on this we'd be happy to review/incorporate a pull request. Since HG002 has the v1.1 truth dataset, it'd be nice if you could test whatever you end up developing on that sample. Certainly you can also reach out with further questions about the current code/flow.

@kentaurse
Copy link

kentaurse commented Nov 7, 2024

@skoren
Thank you for considering my support on this project. I’d be glad to proceed with developing a partitioning method to enhance the current Hi-C pipeline. Before moving forward, could we discuss the terms for compensation, particularly the structure and schedule of payments? I look forward to collaborating and appreciate any additional insights you can share on the workflow.

@skoren
Copy link
Member

skoren commented Nov 7, 2024

Sorry as an open source and public domain project, we don't have funds to support external developers.

@kentaurse
Copy link

Thank you for you answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

4 participants