-
Fixes to
follow_link()
andback()
to correctly manage session history. -
If you're using xml2 1.0.0,
html_node()
will now return a "missing node". -
Parse rowspans and colspans effectively by filling using repetition from left to right (for colspan) and top to bottom (rowspan) (#111)
-
Updated a few examples and demos where the website structure has changed.
-
Made compatible with both xml2 0.1.2 and 1.0.0.
-
Fix invalid link for SSA example.
-
Parse
<options>
that don't have value attribute (#85). -
Remove all remaining uses of
html()
in favor ofread_html()
(@jimhester, #113).
-
rvest has been rewritten to take advantage of the new xml2 package. xml2 provides a fresh binding to libxml2, avoiding many of the work-arounds previously needed for the XML package. Now rvest depends on the xml2 package, so all the xml functions are available, and rvest adds a thin wrapper for html.
-
A number of functions have change names. The old versions still work, but are deprecated and will be removed in rvest 0.4.0.
html_tag()
->html_name()
html()
->read_html()
-
html_node()
now throws an error if there are no matches, and a warning if there's more than one match. I think this should make it more likely to fail clearly when the structure of the page changes. -
xml_structure()
has been moved to xml2. Newhtml_structure()
(also in xml2) highlights id and class attributes (#78). -
submit_form()
now works with forms that use GET (#66). -
submit_request()
(and hencesubmit_form()
) is now case-insensitive, and so will find<input type=SUBMIT>
as well as<input type="submit">
. -
submit_request()
(and hencesubmit_form()
) recognizes forms with<input type="image">
as a valid form submission button per http://www.w3.org/TR/html-markup/input.image.html
-
html()
andxml()
pass...
on tohttr::GET()
so you can more finely control the request (#48). -
Add xml support: parse with
xml()
, then work with usingxml_node()
,xml_attr()
,xml_attrs()
,xml_text()
andxml_tag()
(#24). -
xml_structure()
: new function that displays the structure (i.e. tag and attribute names) of a xml/html object (#10).
-
follow_link()
now accepts css and xpath selectors. (#38, #41, #42) -
html()
does a better job of dealing with encodings (passing the problem on toXML::parseHTML()
) instead of trying to do it itself (#25, #50). -
html_attr()
returns default value when input is NULL (#49) -
Add missing
html_node()
method for session. -
html_nodes()
now returns an empty list if no elements are found (#31). -
submit_form()
converts relative paths to absolute URLs (#52). It also deals better with 0-length inputs (#29).