Skip to content

Latest commit

 

History

History
86 lines (64 loc) · 10.4 KB

70.sustainability.md

File metadata and controls

86 lines (64 loc) · 10.4 KB

Sustainability {#sec:sustainability}

When discussing software projects, "Sustainability" is a complex, expansive topic that typically means different things to different people. A number of different definitions have been proposed -- ranging from pragmatic descriptions of responsivity to bug reports or changing hardware, to more idealistic descriptions of active development and ever-increasing functionality. For the purposes of describing yt's approach to sustainability, we will use these as path markers but not restrictions.

yt is supported through both peer-production and grant-funded development. At times, this can pose challenges; the requirements imposed by grant-funded development naturally concentrates decision making, but within the yt community we have (so far) navigated this through deep engagement in community processes and interaction. However, at the risk of belaboring a point that has been well-explored elsewhere, grant-funded development has traditionally focused on "new" or "innovative" development, rather than maintenance of a software project.

A tension exists, however, between support of an existing project and the support of new projects in an ecosystem. By supporting an existing project, resources can tend to become concentrated; conversely, if a project supports a broader research agenda, that resource concentration can result in greater effort-multipliers for individuals who utilize the project. We're aware of this tension in yt; in fact, while yt has been grant-supported, most of the grant development has gone to a very small number of groups. This grant funding has been provided through the National Science Foundation, the Gordon and Betty Moore Foundation, the Department of Energy, the Chan Zuckerberg Initiative and other sources. [@doi:10.6084/m9.figshare.2061465.v1, [@doi:10.6084/m9.figshare.909413.v1], [@doi:10.5281/zenodo.4158589]. Grants have supported the development of new features, including specific functionality for analysis routines and support for non-astronomical domains.

Into each of these grants has been explicit support for community building, constituted by the development of documentation, videos, and tutorials, as well as mentoring of new contributors and shepherding the growth of the project through code review and issue management. While this does provide support for individuals who can provide dedicated, thoughtful attention to code review and bug reports, it is, quite frankly, insufficient without the engagement of non-funded community members who contribute their time and energy.

There are a number of avenues of development for yt, each of which draws different degrees of interest, urgency, and breadth of engagement. A few of these are worth highlighting.

Platform Functionality

Improving and developing the functionality of yt as a whole is a particular focus for investment of time and resources. Typically, this is conducted via one of two avenues; the first is through explicit, funded development done for specific use cases. The second mode of functionality development is when community members (either those who are already participants in the community or those who seek to) identify a feature that meets their particular use case, and work to develop it.

Community contributions usually fall into one of two categories -- new functionality (such as supporting new datasets, applying new operations to datasets) or by scaling or improving the performance of yt for a use case or in general. The former is more common than the latter, although both have occurred. Expanding the featureset of yt tends to be more attractive than optimization for a large swath of the userbase; in some sense, optimization opens new avenues for scientific exploration (by taking formerly out-of-reach options and placing them within the realm of reasonable execution time), but adding new features certainly does.

A few examples of new functionality provided by community members include the volume rendering of octree datasets, support for new code frontends (such as AREPO), and many of the analysis modules. New functionality provided by funded development has included long-term improvements such as the "demeshening" of particle datasets and support for non-astrophysical domains. A few notable improvements to the efficiency and scalability of yt include multi-level parallelism operations and the initial implementation of dask as an array backend.

In general, we have found it difficult to move through large upheavals of code without large-scale effort on the part of the community. This has resulted in a reluctance to investigate particularly invasive changes, and can provide a distortion of the cost/benefit analysis, leaning toward risk aversion. The YTEP process mitigates this somewhat, but also provides additional opportunities for a 'veto' that can dampen enthusiasm and impede development. At the same time, the safe guards provided by this can help to ensure stability for the community, particularly those not actively engaged in development.

Project Maintenance

In addition to development of new features, the correction of problems or updates to old behavior are also critical to the sustained usage and applicability of a project. In this category of "bug fixes" we certainly include faults or problems with yt itself, but it's also important to note that in many cases, we can also use this category to describe improvements in behavior that bring the results greater accuracy or precision.

It is tough to estimate the cost of bug fixes, in terms of labor or in terms of community member attrition, but @doi:10.1007/s11219-021-09564-z suggests that large, established projects on github may spend up to 40% of their time on bugfixes rather than new features. While we have not been able to verify this, examining the cadence of bug fix releases versus new feature releases suggests that this is indeed the case for yt as well.

Maintenance and bug fixes are critical to developing yt as a community-focused code. The damage that can be done by ignoring a bug report, dismissing a suggestion, or outright rejecting proposed fixes is real, and can have measurable impact on community cohesion and growth. As such, in the yt development process we strive to be as accommodating and welcoming as possible. This requires balancing being welcoming and patient against the very real costs that come with responding to bug fixes and feature requests; often this means providing helpful insight that may already be covered by the documentation, or that may reflect known bugs. This is a challenge, and one that benefits from having dedicated maintainers, or at least dedicated maintenance time from the developer community. Furthermore, we have found that regular, collaborative co-working sessions can ease this burden, although they occasionally veer into too light of code review or constant reiteration of "standing" issues.

Ecosystem Maintenance

yt exists inside a larger ecosystem of packages and infrastructure, composed of the Python language, the Scientific Python community, as well as the computational science communities (predominantly astrophysics). Each of these communities, much like yt itself, has its own set of priorities, needs, and development schedules. Changes in systems that yt depends on may require changes in yt; changes in what other systems expect from yt may likewise require changes in yt.

One of the more notable shifts in recent years has been in how Python libraries and applications are distributed. At the time of the first yt method paper, yt was distributed typically in source form with its entire dependency stack to provide a uniform experience on supercomputer installations. This was, at the time, "necessary" because Python was not as widely used in the computational science and astrophysics communities as it is now, and the versions of various libraries were often far out of date. In the intervening time, however, a number of shifts have occurred, including the advent of conda (and conda-forge), wheels, wide-scale adoption of Python in the scientific community, and considerable standardization and improvement of package metadata and installation methods. Ensuring yt maintains compatibility with the modern package ecosystem has taken, at times, a fairly-large degree of effort on the part of the developers making the changes and the developers who intend to continue developing.

In addition to the changes in yt necessitated by changes to packaging or updates in APIs or behavior, there are also those changes that are necessary to conform to community norms. For example, utilizing code formatting tools, automated "linting" and other automated methods of applying "quality control" to incoming changesets. Or, somewhat more invasively, inserting "type hints" that can provide automated analysis and inline suggestions within integrated development environments. These developments may not provide as much value to long-time developers, but can be important for newcomers who are less familiar with the library or coding conventions. The balance with these changes can be difficult to strike, to minimize disruption to existing developers and users while still expanding the accessibility of the library to others.

Enjoyment Maximization

Another aspect of sustainability is that for many people, developing software can be fun. This is certainly true for many members of the yt community -- sometimes in ways that align with the goals and requirements of academic research, and sometimes in ways that do not directly advance those goals. For instance, the initial development of the software volume renderer was not aligned with immediate research goals, but it has gone on to support synthetic observations and to foster much broader research and education opportunities.

While development for the joy of developing can be an exciting and helpful process, it is also something that can require a delicate balance of attention and engagement from the rest of the community. Accepting changes into the yt codebase requires careful review, and dedication of resources (physical and intellectual) that may already be allocated elsewhere.

Furthermore, not everyone derives pleasure from the same types of development. Some individuals love to build new things and scaffold out plugins or visualization types; others have great satisfaction from optimization or documentation. Accepting and supporting these different types of development is critical for a community built on respect, trust and gratitude.