Empirical protein models like WAG+F with estimated frequencies are not supported #132

bredelings · 2018-07-31T19:12:25Z

Now that #130 has been fixed, fnWAG() uses the fixed frequencies estimated in the WAG paper, and produces a fixed Q matrix with no parameters.

However, what people usually do is to estimate the frequencies, while fixing the symmetric exchangabilities. This would be easy to code, the only question is what kind of syntax we would want and what to name things.

Basically we want something like fnWAG() has the current behavior, but fnWAG(pi) uses frequencies in pi. Then we could place a dirichlet distribution (or something) on pi.
Also, technically, this is a GTR model, with exchangabilities supplied by WAG, and frequencies pi being estimated. If the GTR model could be changed to take a symmetric matrix, then we could make fnWAG() just return the symmetric matrix, and WAG+F would be something like fnGTR(fnWAG(),pi).
A third approach (which seems to work so far) is to define fnWAG(pi) to always take a frequency vector. We then add a fnWAG_freq() to yield the fixed frequencies from the WAG paper. Users would the write fnWAG(pi) to estimate frequencies pi, and would write fnWAG(fnWAG_freq()) to use the fixed frequencies.

Since estimating frequencies is more common than using the fixed frequencies, I would recommed something like approach 3. If revbayes functions support default values for parameters, we could make fnWAG_freq() to be the default_value of pi for fnWAG(pi), which would be pretty nice.

Thoughts?

P.S. Here is a case where someone wants to estimate the amino-acid frequencies, although not with the WAG - https://groups.google.com/forum/#!topic/revbayes-users/cmhwuYklecg

bredelings · 2018-07-31T19:12:55Z

@mlandis @hoehna @jembrown

jembrown · 2018-08-02T19:04:44Z

The ability to estimate frequencies seems like a good idea, but I don’t have a strong opinion about the syntax. I’m guessing others might, though! Cheers, Jeremy

…

On Jul 31, 2018, at 2:14 PM, Benjamin Redelings ***@***.***> wrote: Right now fnWAG() uses the fixed frequencies estimated in the WAG paper, and produces a fixed Q matrix with no parameters. However, what people usually do is to estimate the frequencies, while fixing the symmetric exchangabilities. This would be easy to code, the only question is what kind of syntax we would want and what to name things. Basically we want something like fnWAG() has the current behavior, but fnWAG(pi) uses frequencies in pi. Then we could place a dirichlet distribution (or something) on pi. Also, technically, this is a GTR model, with exchangabilities supplied by WAG, and frequencies pi being estimated. If the GTR model could be changed to take a symmetric matrix, then we could make fnWAG() just return the symmetric matrix, and WAG+F would be something like fnGTR(fnWAG(),pi). A third approach (which seems to work so far) is to define fnWAG(pi) to always take a frequency vector. We then add a fnWAG_freq() to yield the fixed frequencies from the WAG paper. Users would the write fnWAG(pi) to estimate frequencies pi, and would write fnWAG(fnWAG_freq()) to use the fixed frequencies. Since estimating frequencies is more common than using the fixed frequencies, I would recommed something like approach 3. Thoughts? P.S. Here is a case where someone wants to estimate the amino-acid frequencies, although not with the WAG - https://groups.google.com/forum/#!topic/revbayes-users/cmhwuYklecg <https://groups.google.com/forum/#!topic/revbayes-users/cmhwuYklecg> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <revbayes/revbayes#132>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyDikPGtdZpEwbgCUeQqTkqKzf36nhZks5uMKyDgaJpZM4VpDXM>.

mlandis · 2018-08-02T19:11:13Z

(I emailed this to Ben, but I guess GitHub didn't add it to the thread)

Personally, I like Option 1 the best, since it wouldn't require new users to learn "special" functions to design their models. If we want to allow all empirical rate matrices to accept pi/er parameters, that might require some deeper redesign/reorganization of the empirical rate matrix family. So I'd vote to hold off on that for now.

A variant on Option 2 would be to add a helper function that supplies various empirical rate matrix values, e.g.

bf_WAG <- makeEmpiricalMatrixValues(model="WAG", parameter="frequencies")
er_WAG <- makeEmpiricalMatrixValues(model="WAG", parameter="rates")
Q_const <- fnGTR( exchangeRates=er_WAG, baseFrequencies=bf_WAG )

bf_flat ~ dnDirichlet( simplex(rep(1,20)) )
Q_flat := fnGTR( exchangeRates=er_WAG, baseFrequencies=bf_flat )

bf_emp ~ dnDirichlet( simplex(bf_WAG) )
Q_emp := fnGTR( exhangeRates=er_WAG, baseFrequencies=bf_emp )

What do you think?

bredelings · 2018-08-04T20:12:22Z

Hi Michael, I didn't see your e-mail, just the github post.

Anyway, yes, it does seem like Option 1 is nicest and easiest to guess or learn. Does RevBayes allow different functions to have the same name but different numbers of arguments? Alternatively, does RevBayes allow functions to have default values for parameters (Option 3)? If either is true, then I think I see how to implement this.

Your variant on Option 2 is interesting. I like the option to use the bf_WAG but put a prior on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empirical protein models like WAG+F with estimated frequencies are not supported #132

Empirical protein models like WAG+F with estimated frequencies are not supported #132

bredelings commented Jul 31, 2018 •

edited

Loading

bredelings commented Jul 31, 2018

jembrown commented Aug 2, 2018 via email

mlandis commented Aug 2, 2018

bredelings commented Aug 4, 2018

Empirical protein models like WAG+F with estimated frequencies are not supported #132

Empirical protein models like WAG+F with estimated frequencies are not supported #132

Comments

bredelings commented Jul 31, 2018 • edited Loading

bredelings commented Jul 31, 2018

jembrown commented Aug 2, 2018 via email

mlandis commented Aug 2, 2018

bredelings commented Aug 4, 2018

bredelings commented Jul 31, 2018 •

edited

Loading