Skip to content

Commit

Permalink
Remove support for direct use of null-terminated strings with the par…
Browse files Browse the repository at this point in the history
…ser APIs.

Fixes #175.
Fixes #190.
  • Loading branch information
tzlaine committed Oct 3, 2024
1 parent 274e4e3 commit 23017af
Show file tree
Hide file tree
Showing 19 changed files with 458 additions and 519 deletions.
12 changes: 5 additions & 7 deletions doc/tables.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,17 @@ itself be used as a parser; it must be called. In the table below:

* `a` is a semantic action;

* `r` is an object whose type models `parsable_range_like`; and
* `r` is an object whose type models `parsable_range`; and

* `p`, `p1`, `p2`, ... are parsers.

* `escapes` is a _symbols_t_ object, where `T` is `char` or `char32_t`.

[note The definition of `parsable_range_like` is:
[note The definition of `parsable_range` is:

[parsable_range_like_concept]
[parsable_range_concept]

It is intended to be a range-like thing; a null-terminated sequence of
characters is considered range-like, given that a pointer `T *` to a
null-terminated string is isomorphic with `subrange<T *, _null_sent_>`.]
]

[note Some of the parsers in this table consume no input. All parsers consume
the input they match unless otherwise stated in the table below.]
Expand Down Expand Up @@ -371,7 +369,7 @@ Here are all the operator overloaded for parsers. In the tables below:

* `a` is a semantic action;

* `r` is an object whose type models `parsable_range_like` (see _concepts_);
* `r` is an object whose type models `parsable_range` (see _concepts_);
and

* `p`, `p1`, `p2`, ... are parsers.
Expand Down
44 changes: 10 additions & 34 deletions doc/tutorial.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -1807,8 +1807,7 @@ common:
* They each return a value contextually convertible to `bool`.

* They each take at least a range to parse and a parser. The "range to parse"
may be an iterator/sentinel pair or an single range-like object. Note that
"range-like" includes null-terminated string pointers.
may be an iterator/sentinel pair or an single range object.

* They each require forward iterability of the range to parse.

Expand All @@ -1823,7 +1822,7 @@ common:
the last location within the input that `p` matched. The *whole* input was
matched if and only if `first == last` after the call to _p_.

* When you call any of the range-like overloads of _p_, for example `_p_np_(r,
* When you call any of the range overloads of _p_, for example `_p_np_(r,
p, _ws_)`, _p_ only indicates success if *all* of `r` was matched by `p`.

[note `wchar_t` is an accepted value type for the input. Please note that
Expand All @@ -1834,7 +1833,7 @@ this is interpreted as UTF-16 on MSVC, and UTF-32 everywhere else.]
There are eight overloads of _p_ and _pp_ combined, because there are three
either/or options in how you call them.

[heading Iterator/sentinel versus range-like]
[heading Iterator/sentinel versus range]

You can call _pp_ with an iterator and sentinel that delimit a range of
character values. For example:
Expand Down Expand Up @@ -1868,32 +1867,11 @@ allows calls like `_p_np_("str", p)` to work naturally.
auto result_2 = bp::parse(U"str", p, bp::ws);

char const * str_3 = "str";
auto result_3 = bp::parse(str_3 | boost::parser::as_utf16, p, bp::ws);

You can also call _p_ with a pointer to a null-terminated string of character
values. _p_ considers pointers to null-terminated strings to be ranges,
since, for any pointer `T *` to a null-terminated string, `T *` is isomorphic
with `subrange<T *, _null_sent_>`.

namespace bp = boost::parser;
auto const p = /* some parser ... */;

char const * str_1 = /* ... */ ;
auto result_1 = bp::parse(str_1, p, bp::ws);
char8_t const * str_2 = /* ... */ ;
auto result_2 = bp::parse(str_2, p, bp::ws);
char16_t const * str_3 = /* ... */ ;
auto result_3 = bp::parse(str_3, p, bp::ws);
char32_t const * str_4 = /* ... */ ;
auto result_4 = bp::parse(str_4, p, bp::ws);

int const array[] = { 's', 't', 'r', 0 };
int const * array_ptr = array;
auto result_5 = bp::parse(array_ptr, p, bp::ws);
auto result_3 = bp::parse(bp::null_term(str_3) | bp::as_utf16, p, bp::ws);

Since there is no way to indicate that `p` matches the input, but only a
prefix of the input was matched, the range-like (non-iterator/sentinel)
overloads of _p_ indicate failure if the entire input is not matched.
prefix of the input was matched, the range (non-iterator/sentinel) overloads
of _p_ indicate failure if the entire input is not matched.

[heading With or without an attribute out-parameter]

Expand Down Expand Up @@ -3049,10 +3027,9 @@ code paths, as they are written generically. The only difference is that the
Unicode code path parses the input as a range of code points, and the
non-Unicode path does not. In effect, this means that, in the Unicode code
path, when you call `_p_np_(r, p)` for some input range `r` and some parser
`p`, the parse happens as if you called `_p_np_(r | boost::parser::as_utf32, p)`
instead. (Of course, it does not matter if `r` is a null-terminated pointer,
a proper range, or an iterator/sentinel pair; those all work fine with
`boost::parser::as_utf32`.)
`p`, the parse happens as if you called `_p_np_(r | boost::parser::as_utf32,
p)` instead. (Of course, it does not matter if `r` is a proper range, or an
iterator/sentinel pair; those both work fine with `boost::parser::as_utf32`.)

Matching "characters" within _Parser_'s parsers is assumed to be a code point
match. In the Unicode path there is a code point from the input that is
Expand Down Expand Up @@ -3187,8 +3164,7 @@ the parser.

The other adaptors `as_utf8` and `as_utf16` are also provided for
completeness, if you want to use them. They each can transcode any sequence
of character types. A null-terminated string is considered a sequence of
character type.
of character types.

[important The `as_utfN` adaptors are optional, so they don't come with
`parser.hpp`. To get access to them, `#include
Expand Down
32 changes: 29 additions & 3 deletions include/boost/parser/detail/stl_interfaces/view_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,30 @@ namespace boost::parser::detail { namespace stl_interfaces {
constexpr bool is_invocable_v =
is_detected_v<invocable_expr, F, Args...>;

// This ensures that captures don't decay from arrays to pointers.
// The decay is fine to do for NTBSs, but not arrays like {'a', 'b'}.
// This is done here since it's too late to see that we were passed an
// array where we need it, much later. Consider a call to replace()
// for instance -- we'd want to know in the replace_impl function that
// we were passed an array, but by then it's too late. We are
// *thoroughly* unlikely to be passed anything but an array of
// characters, so I'm not checking here that the array is not ints or
// whatever before chopping off the null terminator.
template<size_t N, typename CharT>
auto array_to_range(CharT (&arr)[N])
{
auto const first = std::begin(arr);
auto last = std::end(arr);
if (N && !arr[N - 1])
--last;
return BOOST_PARSER_SUBRANGE(first, last);
}
template<typename T>
decltype(auto) array_to_range(T && x)
{
return (T &&)x;
}

template<typename Func, typename... CapturedArgs>
struct bind_back_t
{
Expand All @@ -69,7 +93,8 @@ namespace boost::parser::detail { namespace stl_interfaces {

template<typename F, typename... Args>
explicit constexpr bind_back_t(int, F && f, Args &&... args) :
f_((F &&) f), bound_args_((Args &&) args...)
f_((F &&) f),
bound_args_((Args &&) args...)
{
static_assert(sizeof...(Args) == sizeof...(CapturedArgs), "");
}
Expand Down Expand Up @@ -125,8 +150,9 @@ namespace boost::parser::detail { namespace stl_interfaces {
template<typename Func, typename... Args>
constexpr auto bind_back(Func && f, Args &&... args)
{
return detail::bind_back_result<Func, Args...>(
0, (Func &&) f, (Args &&) args...);
return detail::bind_back_result<
Func, detail::remove_cvref_t<decltype(detail::array_to_range(std::declval<Args>()))>...>(
0, (Func &&) f, detail::array_to_range((Args &&) args)...);
}

#if BOOST_PARSER_DEFINE_CUSTOM_RANGE_ADAPTOR_CLOSURE || \
Expand Down
30 changes: 0 additions & 30 deletions include/boost/parser/detail/text/concepts.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
utf8_code_unit<T> || utf16_code_unit<T> || utf32_code_unit<T>;


template<typename T, format F>
concept code_unit_pointer =
std::is_pointer_v<T> && code_unit<std::iter_value_t<T>, F>;

template<typename T, format F>
concept code_unit_range = std::ranges::input_range<T> &&
code_unit<std::ranges::range_value_t<T>, F>;
Expand All @@ -66,17 +62,13 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
template<typename T>
concept utf8_iter = code_unit_iter<T, format::utf8>;
template<typename T>
concept utf8_pointer = code_unit_pointer<T, format::utf8>;
template<typename T>
concept utf8_range = code_unit_range<T, format::utf8>;
template<typename T>
concept contiguous_utf8_range = contiguous_code_unit_range<T, format::utf8>;

template<typename T>
concept utf16_iter = code_unit_iter<T, format::utf16>;
template<typename T>
concept utf16_pointer = code_unit_pointer<T, format::utf16>;
template<typename T>
concept utf16_range = code_unit_range<T, format::utf16>;
template<typename T>
concept contiguous_utf16_range =
Expand All @@ -85,8 +77,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
template<typename T>
concept utf32_iter = code_unit_iter<T, format::utf32>;
template<typename T>
concept utf32_pointer = code_unit_pointer<T, format::utf32>;
template<typename T>
concept utf32_range = code_unit_range<T, format::utf32>;
template<typename T>
concept contiguous_utf32_range =
Expand All @@ -102,9 +92,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
template<typename T>
concept utf_iter = utf8_iter<T> || utf16_iter<T> || utf32_iter<T>;
template<typename T>
concept utf_pointer =
utf8_pointer<T> || utf16_pointer<T> || utf32_pointer<T>;
template<typename T>
concept utf_range = utf8_range<T> || utf16_range<T> || utf32_range<T>;


Expand Down Expand Up @@ -182,23 +169,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
{ t(msg) } -> std::same_as<char32_t>;
// clang-format on
};

template<typename T>
// clang-format off
concept utf_range_like =
utf_range<std::remove_reference_t<T>> ||
utf_pointer<std::remove_reference_t<T>>;
// clang-format on

template<typename T>
concept utf8_range_like = utf8_code_unit<std::iter_value_t<T>> ||
utf8_pointer<std::remove_reference_t<T>>;
template<typename T>
concept utf16_range_like = utf16_code_unit<std::iter_value_t<T>> ||
utf16_pointer<std::remove_reference_t<T>>;
template<typename T>
concept utf32_range_like = utf32_code_unit<std::iter_value_t<T>> ||
utf32_pointer<std::remove_reference_t<T>>;
//]

// Clang 13 defines __cpp_lib_concepts but not std::indirectly copyable.
Expand Down
64 changes: 12 additions & 52 deletions include/boost/parser/detail/text/transcode_algorithm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,34 +19,6 @@

namespace boost::parser::detail { namespace text {

template<typename Range>
struct utf_range_like_iterator
{
using type = decltype(std::declval<Range>().begin());
};

template<typename T>
struct utf_range_like_iterator<T *>
{
using type = T *;
};

template<std::size_t N, typename T>
struct utf_range_like_iterator<T[N]>
{
using type = T *;
};

template<std::size_t N, typename T>
struct utf_range_like_iterator<T (&)[N]>
{
using type = T *;
};

template<typename Range>
using utf_range_like_iterator_t =
typename utf_range_like_iterator<Range>::type;

/** An alias for `in_out_result` returned by algorithms that perform a
transcoding copy. */
template<typename Iter, typename OutIter>
Expand Down Expand Up @@ -652,7 +624,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename Range, typename OutIter>
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
transcode_result<detail::iterator_t<Range>, OutIter>
transcode_to_utf8(Range && r, OutIter out)
{
return dtl::transcode_to_8_dispatch<false, Range, OutIter>::call(
Expand All @@ -670,7 +642,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename Range, typename OutIter>
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
transcode_result<detail::iterator_t<Range>, OutIter>
transcode_to_utf16(Range && r, OutIter out)
{
return dtl::transcode_to_16_dispatch<false, Range, OutIter>::call(
Expand All @@ -688,7 +660,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename Range, typename OutIter>
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
transcode_result<detail::iterator_t<Range>, OutIter>
transcode_to_utf32(Range && r, OutIter out)
{
return dtl::transcode_to_32_dispatch<false, Range, OutIter>::call(
Expand Down Expand Up @@ -719,16 +691,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename R, std::output_iterator<uint32_t> O>
requires(utf16_range_like<R> || utf32_range_like<R>)
requires(utf16_range<R> || utf32_range<R>)
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf8(
R && r, O out)
{
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
return text::transcode_to_utf8(r, null_sentinel, out);
} else {
return text::transcode_to_utf8(
std::ranges::begin(r), std::ranges::end(r), out);
}
return text::transcode_to_utf8(
std::ranges::begin(r), std::ranges::end(r), out);
}


Expand All @@ -750,16 +718,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename R, std::output_iterator<uint32_t> O>
requires(utf8_range_like<R> || utf32_range_like<R>)
requires(utf8_range<R> || utf32_range<R>)
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf16(
R && r, O out)
{
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
return text::transcode_to_utf16(r, null_sentinel, out);
} else {
return text::transcode_to_utf16(
std::ranges::begin(r), std::ranges::end(r), out);
}
return text::transcode_to_utf16(
std::ranges::begin(r), std::ranges::end(r), out);
}


Expand All @@ -781,16 +745,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
}

template<typename R, std::output_iterator<uint32_t> O>
requires(utf8_range_like<R> || utf16_range_like<R>)
requires(utf8_range<R> || utf16_range<R>)
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf32(
R && r, O out)
{
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
return text::transcode_to_utf32(r, null_sentinel, out);
} else {
return text::transcode_to_utf32(
std::ranges::begin(r), std::ranges::end(r), out);
}
return text::transcode_to_utf32(
std::ranges::begin(r), std::ranges::end(r), out);
}

}}}
Expand Down
Loading

0 comments on commit 23017af

Please sign in to comment.