Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readepi() signature and package scope #45

Closed
Bisaloo opened this issue Jul 31, 2023 · 4 comments
Closed

readepi() signature and package scope #45

Bisaloo opened this issue Jul 31, 2023 · 4 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@Bisaloo
Copy link
Member

Bisaloo commented Jul 31, 2023

The idea of having a common wrapper to import many different types of data with a single function call has been pioneered by the rio package and has garnered positive feedback from the community. However, this is a difficult approach to implement and in many cases, it may lead to more confusion. I suggest two key principles to make readepi as clear and as usable as possible:

  • provide two levels of complexity. It would be useful if users can choose if they prefer to use the generic wrapper or a specific function to import their data. This approach is for example implemented in the lightr package, which imports spectrometric data from a variety of formats: https://docs.ropensci.org/lightr/reference/index.html, and links to design discussions we had with @TimTaylor and @jpavlich.

  • uniformise low-level parser signatures (at the moment, the same arguments in different functions have different names and different orders) and give the general readepi() wrapper a similar "feel" no matter the data source you choose. To ensure uniformity in signatures, a good approach can be to rely as much as possible on the inheritsParams roxygen2 tag
    . This provides an incentive and a reminder to use consistent argument names. To give a similar "feel" to readepi() for all data sources, a good approach is to minimize source-specific arguments. If these arguments are absolutely needed, they should probably not be part of the function signature but be passed as .... This however requires clear documentation of each of the low-level parsers, which as per CRAN policy, will probably force you to export the low-level parsers (which provide an extra argument for the previous point).

Looking at the current README, it seems it will be very difficult to reconcile the "reading from file" and "reading for server" approach in the single function while keeping a similar feel. This might suggest that you need to either have two high-level wrappers or reduce the scope of this package. I believe the "one function to read from many different formats" scope is already well covered by rio. IMO, adding a thin wrapper around rio creates more confusion and maintenance complexity for a minimal gain over what rio already proposes. I think there is value in thinking of readepi as a companion to rio. rio reads from many different formats & readepi reads from many different health information systems. This clear distinction of scope strengthen the case for a readepi package while minimizing overlap with solid, well-established alternatives.

@Karim-Mane Karim-Mane self-assigned this Aug 16, 2023
@Karim-Mane Karim-Mane added the help wanted Extra attention is needed label Aug 16, 2023
@Karim-Mane
Copy link
Member

Karim-Mane commented Aug 18, 2023

This is a good point.

We aim at providing to users a tool like devtools where if you don't dig into it, you would not realise that it is internally calling packages like roxygen2, usethis, etc. We understand that an experienced user would prefer to call directly the required functions from their source packages such as {rio}, {redcapR}, {fingertipsR}, etc.

However, we are also anticipating that most users, specially in the developing countries, do not have that much skills/experience to use several packages at the same time. We believe that such users will find it useful to have a "1 stop shop" with a function that will allow them to import data from several sources without knowing the mechanism under the hood. Also note that {readepi} can read several file types that are not covered by {rio}.

To tackle the issue of design in {readepi}, we will replace the credentials_file and file_path arguments by from. The value of this from argument can be a file path (when reading from a file) or a URL (when reading from APIs).
This way, the credentials_file argument will be provided within the ....

What do you think about this?

@Bisaloo
Copy link
Member Author

Bisaloo commented Aug 31, 2023

I agree with the idea of having a "1 stop shop" but with limits. IMO, readepi is perfect being a 1 stop shop for importing data from remote health information systems or databases, wrapping redcapR, fingertipsR, etc.:

  • it doesn't overlap with rio (by wrapping its functionality), which is already a 1 stop shop for reading local data. It is very important for the success of our project that we follow open source etiquette.
  • it simplifies the function signature, providing a much more uniform and homogeneous interface to read from different sources
  • it clarifies the package niche. Creating a clear distinction between local and remote data is a service to the user IMO as it increases their comprehension of the data they're working with
  • it reduces maintenance load

I understand your point about users having limited R knowledge. However, I'm not convinced cramming everything in a single package makes things easier since the package becomes harder to use. The proper way to deal with this is through teaching.

The comparison with devtools is interesting but not completely accurate:

  • rio is already a wrapper with a unified interface. By including its functionality here, we're wrapping a wrapper 😵
  • all packages wrapped in devtools are maintained by the same team, which facilitates maintenance and dealing with breaking changes. Crucially, it removes any concerns about "stealing citations" by wrapping an external package, which can be a very sensitive issue.

Also note that {readepi} can read several file types that are not covered by {rio}.

The best course of action would be to contribute this to rio. It's always a good thing to contribute to the open-source ecosystem beyond the specific packages we're developing from scratch.

@TimTaylor
Copy link

The best course of action would be to contribute this to rio. It's always a good thing to contribute to the open-source ecosystem beyond the specific packages we're developing from scratch.

For reference, rio has now returned to it's original maintainer so worth tracking their plans. See their recent pinned issue and accompanying blog post.

@Karim-Mane
Copy link
Member

Good point - I have removed the read_from_files-related functions and test files.

Will also check to see whether how best to contribute to {rio}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants