Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maximum number of genes in cluster for K1 #54

Open
jpinero opened this issue Dec 13, 2021 · 3 comments
Open

maximum number of genes in cluster for K1 #54

jpinero opened this issue Dec 13, 2021 · 3 comments

Comments

@jpinero
Copy link

jpinero commented Dec 13, 2021

Does it make sense to add a new parameter that allows manipulating the cluster size in the DSD method (K1)? Is there any rationale (or reference) behind setting a maximum number of genes in a cluster to 100?
thanks!

@mattiat
Copy link
Contributor

mattiat commented Dec 15, 2021

Hello,
what you are looking for is the --nclusters parameter. Please check "OPTIONAL PARAMETERS" in the README file or the command line help function.
Best Mattia

@jpinero
Copy link
Author

jpinero commented Dec 17, 2021

Thanks for your answer! I had indeed checked the "OPTIONAL PARAMETERS" section in the README file, and after reading the parameters for K1 my intuition was the --nclusters parameter was not what I was looking for. After changing the --nclusters to 200, I have relatively larger clusters, but the maximum number of genes in a cluster is still 100.
What I would like to have are clusters that have more than 100 genes, and from the documentation, I think that I cannot manipulate this number, right?
cheers
janet

@jjc2718
Copy link
Contributor

jjc2718 commented Dec 17, 2021

@jpinero that's correct, the --n_clusters parameter controls the number of centroids used in the spectral clustering portion of the algorithm, so it sounds like it's not what you want.

There isn't currently an easy command-line way to change the maximum cluster size (the original challenge required a limit of 100 genes and we never needed to modify it), but you can change it by editing the source code here:

If you edit your version of the code to increase that number, you should get larger clusters. Alternatively, you can completely skip the recursive clustering step (which will give you no limit on cluster size) by 1) commenting out this line here

python2 ./clustering/split_clusters.py $DSD_FILE ./data/cluster_results/network_clusters.txt -n $NODELIST_FILE > ./data/cluster_results/network_clusters_split.txt

and 2) editing this line to point to ./data/cluster_results/network_clusters.txt instead of ./data/cluster_results/network_clusters_split.txt

cp ./data/cluster_results/network_clusters_split.txt ./data/final_clusters/clusters.txt

You may have to rerun the install script using the directions in the README to get your local changes to have an effect. Sorry this is a bit complicated - let me know if you run into any difficulties, and I can try to help.

I agree that it would be good to have this option as a command-line parameter as well. I'll try to add one in the next week or two, if I can find the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants