-
Notifications
You must be signed in to change notification settings - Fork 0
Allow user to graphically filter edges by multiplicity #232
Comments
Some pertinent examples of related functionality in practice:
|
It looks like the "bar chart" approach with raw data might be suitable for this (where y-axis indicates multiplicity, and each "bar" extending upward from the x-axis indicates an edge with multiplicity indicated via the y-axis). Below is a screenshot of such a chart (for the edge weights of the 2nd cc of the velvet E. coli assembly graph mentioned above), generated using LibreOffice Calc (it's an unpolished version -- we'd add further labels, etc. for the implementation version). That being said, I'm considering other approaches to this. One alternative approach is a frequency bar chart (where x-axis indicates multiplicity and y-axis indicates number of edges with that multiplicity), where the x-axis might have "bins" of multiplicity (say, increments of 10?). Here's a rough version of that, generated for the same data as above with "bins" of edge multiplicities of 10. |
Will eventually be included in the functionality described in #232.
Should consider allowing the user to pick the number of subdivisions of the edges, instead of picking the "bin sizes." For example, instead of creating bins of size 10 (where each bin corresponds to the number of edges in the cc with a weight that falls within that bin's range of edge weights of length 10), we should let the user pick how many groups -- we could call them bins, I guess -- they want the edges in the cc to be split up into for the chart. So, using the 2nd cc of the Velvet E. coli assembly graph as a benchmark, if the cc contains 32 edges and the user requests 10 bins, then my code should roughly divide the cc's edges into 10 groups, each of about size 3. This might take some work to work well for certain phenomena (for example, what to do when a cc contains a bunch of edges with the exact same weight?). I'd imagine that we can just characterize each bin as a collection of edges, with the defined properties of min and max "child" edge weight. Since each bin would have roughly the same size (might differ by a few when the number of edges in the cc is not evenly divisible by the requested number of bins), we could plot the y-axis as the average edge weight in each bin. That being said I'm not entirely sure how to assign edges to bins. Given e edges and b requested bins, a simple approach of taking f = floor(e / b) and creating b - 1 bins of size f and a final bin of size f + (e - (f * b)) works alright in some cases (for example, for the example cc above it works fine). However, it theoretically could result in very large last bin sizes, which would be undesirable. For example, if a cc has 101 edges and the user requests 21 bins, then the naive approach would create 20 bins of size 4 and a 21st bin of size 4 + (101 - (4 * 21)) = 21. My described "simple approach" earlier in this paragraph, then, clearly isn't a good solution at all (since it results in a relatively large final bin). So I should figure something else out. (We could also just approximate the number of bins -- in the e = 101, b = 21 example, it would probably be ok to just construct 19 bins of size 5 and 1 bin of size 6.) |
In any case, once a bin construction algorithm is ready, it should be simple to calculate each bin (and its min, max, and average) and populate the bar chart accordingly. I also think that the min/max of bins can be presented as a tooltip for each bar (see the above d3.js example of using tooltips in bar charts). |
This prevents a bug where a new chart would just get drawn over the old one.
Via changing the domain from [0, 1] to [0, max * 1.1]. Setting [max * 1.1] as the top of the domain also should mean that all the values in the histogram are contained within the span of the x-axis (unlike before, where the max was getting pushed to the right of the x-axis).
Other changes in this commit: -Changed up the JS to not enable/disable the cullEdges stuff (since that will only be available if the filter edges stuff is available -- therefore, no need to worry about enabling/disabling sub-controls within the edge filtering dialog). -Encapsulated the new bin ct. UI element and the cullEdges UI element in Bootstrap cols and a row, making the edge filtering dialog's formatting look a lot nicer than before. -Moved the edge weight histogram drawing stuff into its own function to facilitate ease of redrawing the histogram. Next up for #232 is improving the general UI of this (e.g. using enter to redraw the histogram), maybe flipping the UI elements so the buttons are on the right of the text inputs (would match the search bar), making the chart look prettier (fixing occasional bar spacing issues, removing floating-point tick values, making sure that the max edge weight is still included in other bins (?), etc.), and adding graphical filtering functionality.
This leaves more space for the y-axis label when the frequencies of edge multiplicities get rather large.
Something worth considering -- the current method of scaling every edge to within the range Not sure how to counteract this; will have to read more about this. |
This issue was moved to marbl/MetagenomeScope#61 |
Sort of similarly to the charting mechanism for node lengths described in #99, we would use d3.js here (or another JS library) to generate a plot of edge weight against frequency in a given connected component of the graph. (I guess we could just show this in a dialog we'd pop up, similarly to the current "assembly information" dialog.)
Similarly to the "contributions" page for each GitHub repository (example here -- in particular, the green-colored chart near the top of the page), the user should be able to select a region of the chart and only show those edges. Unselected edges would be hidden. (This feature would subsume the "hide edges by weight" feature that I mostly implemented back in March 2017.) I'd imagine the chart-selecting functionality would involve a few main parts, notably:
min
andmax
or something -- not too difficult)I remember there were a lot of problems re: making edge weights work ok with collapsing/uncollapsing, and those problems ultimately necessitated temporarily disabling the edge hiding feature while I worked on more important stuff. It might be easiest, then, to not actually delete unselected edges but to just make them styled as invisible. That should work ok with collapsing, I think?
The text was updated successfully, but these errors were encountered: