Author Note: This report was prepared by the authors as part of a series of projects related to ecological modeling in the US Army Corps of Engineers. However, this document represents only the opinions of the authors and not the USACE or any other organization. The document is shared in the spirit of open science and not as a product of any agency activities.

Abstract

Transparency and accessibility are key parts of maintaining the integrity of reproducible science. Open Work is a philosophical approach for increasing the repeatability, quality, and shareability of a project, which is generally characterized by components of license, access, and format. Open science operationalizes Open Work by embracing best practices in the context of research and science-informed decision-making. Although some practices have been adopted, many aspects of Open Work and open science remain underutilized in ecosystem restoration. Within the realm of ecological modeling, open science practices can be used to streamline model development; benefiting model developers, model users, and other technical stakeholders. The goals of this paper are to: (1) describe the foundations of open science and its importance to USACE, (2) direct readers to resources for learning open science methods, and (3) demonstrate the value of open science methods in the context of USACE ecological modeling. Open science practices have the potential to improve the quality of USACE studies by increasing transparency, accessibility, and reproducibility, ultimately saving time and resources.

Introduction

Broadly speaking, ecological models “represent a system for the purposes of reproducing, simplifying, analyzing, or understanding it” (EC-1105-2-412). More practically, ecological models are conceptual and numerical tools used to predict ecological benefits and impacts of management actions. Models are developed through a five-step process of conceptualization, quantification, evaluation, application, and communication (Grant and Swannack 2008, Swannack et al. 2012, McKay et al. 2019). Transparent development allows for ecological models to be effectively integrated into management applications and embraced by stakeholders (van den Belt 2004, Swannack et al. 2012, Herman et al. 2019).

Transparency, replicability, and openness have emerged as fundamental issues across all aspects of ecological science, which can be redressed by sharing underlying data or removing access fees of publicly funded science (Powers and Hampton 2019). For instance, a White House task force was formed in May 2021 to “address short-term, high-priority actions to strengthen integrity and also lay [sic] the groundwork for longer-term coordination of Federal agency scientific efforts.” (Scientific Integrity Task Force | The White House). The concepts of openness and transparency are proposed in this report as an enabler of scientific integrity with a specific emphasis on “open science” policies and practices.

Open Science practices may prove to be beneficial in the context of USACE project planning, decision-making and ecological modeling. Although the lexicon of open science may be new, the concepts build from long-term federal initiatives focused on upholding scientific integrity and increasing transparency in agency practices and decisions (Holdren 2010). The objectives of this technical note are to: (1) describe the foundations of open science and its importance to ecological modeling, (2) direct readers to resources for learning open science methods, and (3) demonstrate the value of open science methods in the context of USACE ecosystem restoration planning.

What is Open Science?

Open Science is characterized by transparent data collection processes, products, code, and metadata (Powers and Hampton 2019). The Center for Open Science has laid out goals of open science regarding the relevant parties involved throughout the collaborative research pipeline: scholarly service providers, institutions, funders, researchers, reviewers, librarians, and consumers. Open Work is defined by the Open Knowledge and the Open Definition Advisory Council as a set of three key principles (taken verbatim):

Table 1. Hypothetical examples of open science practices to achieve the goals of Open Work.
Hypothetical Scenario Open License Open Access Open Format
A USACE planning study is applying collaborative model development methods with a set of technical stakeholders to inform agency decision-making on a large, regional ecosystem restoration project. High Degree of Open Licensing. The model can be used by anyone. Avoid forfeiting all copyright claims; attribution, virality, and liability were chosen with the model use and Authors’ discretion in mind. High Degree of Open Access. All model resources & data are readily available. Users may download model code, webapps, and documentation at no cost through online repositories (e.g., the ERDC Knowledge Core), . Model code development through shared repositories (e.g., GitHub). High Degree of Open Format. Documents are filed concisely with consistent and identifiable naming conventions. All documents and code follow widely accepted style guides, as applicable. All dependencies are free.
A USACE planning study is developing a rapidly applicable ecological model to inform a small-scale restoration project through the Continuing Authorities Program. High Degree of Open Licensing. The model can be used by anyone. Avoid forfeiting all copyright claims; attribution, virality, and liability were chosen with the model use and Authors’ discretion in mind. Medium Degree of Open Access. Most model resources are readily available. Users may download model code at no cost through online repositories and GitHub. Model code development through GitHub. Medium Degree of Open Format. Documents are filed concisely with consistent and identifiable naming conventions. All documents and code follow community style guides. Some products require paid subscriptions, free through most institutions.
The USACE is collecting long-term ecological monitoring data at a series of sites supporting a major, regional ecosystem restoration initiative. High Degree of Open Licensing. Licensing will be dependent on the repository or publisher of choice. Medium Degree of Open Access. Most model resources are readily available. Users may download dataset through online repositories. No GitHub due to publishing conflict. Medium Degree of Open Format. Data and metadata are machine readable and curated. Can download as multiple file types.
A military installation is developing a set of ecological models to inform endangered species management. High Degree of Open Licensing. The study may contain sensitive information. Refer to the Technology Transfer Office. Follow guidelines for Controlled Unclassified Information (CUI). Medium Degree of Open Access. Some model resources are openly available, but most have restricted access. Model resources are stored on secure servers and locally as CUI. Medium Degree of Open Format. CUI documents are strictly formatted and labeled. The style and conventions of non-CUI are tailored to a handful of individuals who have access to the suite.
A USACE planning study is assessing the environmental impacts associated with navigation actions at a sensitive port facility. High Degree of Open Licensing. The study may contain sensitive information. Refer to the Technology Transfer Office. Follow CUI guidelines. Medium Degree of Open Access. No detailed model documents are available on the internet to avoid release of sensitive information. Medium Degree of Open Format. Supporting documents follow strict guidelines, however, the model code is not well commented.

Implementation of open science requires commitment and resources that in turn pay dividends in the long term (e.g., for long duration projects, for multiple projects in series, or for the broader body of knowledge). Tools and methods used to practice open science vary depending on the scaling and specific project needs, and open science may be inappropriate in some instances (e.g., national defense, intellectual property). Accessibility in the framework of open science does not mean accessibility to everyone, rather, it is focused on a target audience such as relevant ERDC scientists or USACE project team members; presenting and creating technical tools that are fully accessible and understandable to all audiences is often infeasible.

Benefits of Open Science to Ecological Modeling

Open science approaches add value to USACE ecological modeling activities through benefiting different participants in the ecological modeling process, identifying and applying applicable leverage points for open science, and increasing availability and accessibility to enhance communication and sharing of ecological model outcomes.

Reaching the Target Audience

Open Work forces model development teams to consider which target audience are important to reach (i.e., who is the audience that should have access). Some roles associated with ecological model development may be described as follows:

  • Model developers are those conceptualizing and programming numerical tools for USACE applications. Developers benefit from open science by having their models become more accessible and thereby increasing the likelihood of adoption. More importantly, developers can benefit from the input of others on model structure, incorporate different bodies of knowledge, and provide tools that better meet user needs (i.e., through co-development and participatory processes, Herman et al. 2019).
  • Model users are those applying numerical models to inform decisions, which sometimes are synonymous with developers, but may be a separate group applying the tools but not involved with development. Model users need details about model theory, assumptions, and use to avoid misapplication of models in decision processes.
  • Technical professionals use model outcomes to make informed decisions but are rarely involved in numerical analyses. These audiences include those within the USACE making choices about project alternatives (e.g., project managers or District Commanders), as well as external audiences providing regulatory input. This audience needs to understand tool development, model outcomes, and assumption and limitations of tools; all of which affect their trust in the models, buy-in to the tools, and appropriate interpretation of results.
  • Stakeholders, for the purpose of this paper, encompass a wide variety of groups with varying levels of interest in model results, which can include funding entities, academics, non-profit partners, and the general public. This diverse group typically is most interested in models as they relate to a particular project or geography, rather than the technical tools themselves. Deep technical understanding may not be necessary for this group. Rather this group is concerned with the ability to easily convey model purposes, grasp concepts, and trust that appropriate processes were followed.

Increasing Transparency in Model Development

Ecological model development typically proceeds through a series of five general steps of conceptualization, quantification, evaluation, application, and communication (Grant and Swannack 2008, Swannack et al. 2012, McKay et al. 2019). Collaborative and participatory models have been highlighted as mechanisms for increasing the transparency of each of these modeling phases (Langsdale et al. 2013, Herman et al. 2019). Here, we briefly outline key opportunities for developers to apply open science methods aligned with each modeling phase (Table 2).

Table 2. Entry points for open science in each phase of ecological model development.
Modeling Phase Entry Points for Open Science
Conceptualization Input from diverse technical professionals and stakeholders that guides conceptual models. Development of multiple conceptual models to guide competing or complementary ecological models. Transparent documentation of a conceptual model that guides numerical tools development but that is more accessible than modeling code.
Quantification Flexibility in selection of modeling software or programming language to maximize accessibility by key audiences. Integrated download of input data with the model. Free access for code repository. Version controlled systems allowing for collaboration.
Evaluation Clear, identifiable methods available through documentation. Well commented code to guide users through model mechanics. Transparent model testing procedures and/or publication of testing outcomes (McKay et al. 2022).
Application Data sharing of input and output. Data storage in transferrable formats with appropriate archival. Clear identification of model versioning and/or application history (e.g., model user, run dates, etc.).
Communication Publishing of easily identifiable contact information . Post-processing of outcomes for synthesis by other audiences. Carefully crafted figures for data visualization (McKay 2019).

Making Models More Available and Accessible

Open science methods provide a suite of techniques for overcoming common communication challenges, such as the perception that models are “black boxes”, an obscure knowledge of model internal workings (Herman et al. 2019). Specifically, an Open Work approach emphasizes the importance of both availability and accessibility of models and model outcomes.

Open science advocates often focus on barriers to accessibility. Model availability is related to the capacity to obtain modeling resources, such as input data, model code, testing outputs, reports, or journal papers. For instance, data and model code should be posted in publicly available and durably archived repositories for long term use (Powers and Hampton 2019). A second common issue hindering availability relates to costs for obtaining model resources or documentation (e.g., subscription requirements or download fees), which can be overcome through open access publication, sharing through public research networks (e.g., data dryad), release through institutional libraries (e.g., the ERDC Knowledge Core), or posting on public websites (e.g., USACE District sites sharing feasibility studies).

Availability of model resources may, however, be insufficient to overcome the opaqueness of the mechanics and assumptions involved with ecological model computation. Therefore, the second component of effective model communication is the accessibility of model resources for different audiences. Accessibility may be hindered by several factors: the expertise of developers vs. users (e.g., familiarity with a coding language), whether input or output data is in an open format (e.g., proprietary vs. standard data format), or language requirements of a publication outlet (e.g., journals in English, Spanish, or Mandarin). Accessibility issues can be overcome through thoughtful development approaches such as the development of a “point and click” application for users unfamiliar with coding (see details discussion below). Although universal access may be difficult to attain, modelers should try to ensure that audiences with appropriate technical expertise have access to ecological modeling products. (e.g., Table 1).

Open Work Tools for Ecological Modeling

Here, we explore tools that facilitate open science in ecological modeling. These tools are categorized by purpose and usefulness with respect to implementation throughout the project pipeline. This section is a starting point for incorporating Open Work methods into modeling workflows.

Idea Sharing

Open Work emphasizes the importance of transparency in scientific process, and ecological modeling is commonly cited as having challenges associated with opaque development processes or the underlying structure of models (Herman et al. 2019). A growing family of methods are emerging for increasing the transparency of ecological models, some of which include:

  • Conceptual models: The ecological modeling community values the use of conceptual models, both as a means to tool development and for increasing transparency (Swannack et al. 2012). Conceptual models describe the functional interrelationships of ecosystem components (Fischenich 2008) and provide a mechanism for model developers to communicate with users, technical professionals, and stakeholders about the underlying model mechanics. These “mental pictures” vary in detail and breadth based on the system of interest and the associated model. An effective conceptual model can convey ideas clearly, thereby increasing transparency.
  • Participatory Workshops: A range of participatory methods have been used to engage multiple audiences in the development of ecological models (van den Belt 2004, Voinov and Bousquet 2010, Herman et al. 2019). Often these methods rely on workshops where developers, users, and technical professionals co-produce conceptual models of an ecosystem and discuss the general framework for numerical modeling. This approach allows for a wide breadth of participants to be empowered through hands-on, interactive model development within the workshop structure. These methods reduce barriers to entry on modeling and encourage better communication and trust among partners.
  • Facilitation tools: These tools are used to share, develop, and organize ideas among developers and other audiences. While some set-ups may struggle to facilitate multiple, simultaneous interactions, there are methods to enable larger groups to share ideas in a virtual or hybrid setting. Examples include Miro, Poll Everywhere, and Menti.

Data Sharing

In many scientific or research contexts, data sharing is a major focus of open science initiatives. In general, data sharing refers to the set of techniques for bundling data and metadata, and archiving them in private or public repositories (Michener 2015). In the context of ecological modeling, data sharing can include the archival of both input and/or output data. Making data accessible allows for increased transparency for model users, as well as increasing accessibility for those using the data outside of the given model application. Confidentiality and data ownership are primary concerns that must be considered with respect to data sharing (e.g., security issues, Tribal data sovereignty). Therefore, the platforms used for data archival often vary depending on these concerns, intended audiences, and funding source stipulations. For instance, sensitive project data could be shared internally via a USACE server or shared drive, whereas publicly available information could be posted to a website, scientific repository, or submitted to a growing number of journals devoted to data sharing (e.g., Nature Scientific Data, Data in Brief).

Open License Tools

As mentioned, Open Licensing is one of three primary pillars in the Open Work philosophy. This approach manifests in two primary formats for ecological modeling: use of open license tools and development of open license ecological models.

Ecological models can be developed in many platforms, ranging from spreadsheets and statistical software, to coding languages and proprietary software. Tools may be selected on this spectrum to maximize accessibility for different audiences. On one end of the spectrum, a variety of free and open-source software (FOSS) exists for model building, such as the Python or R programming languages. However, these coding languages may not be the standard tools in an area of practice, and accessibility could (paradoxically) be increased through closed-source software or proprietary ecological modeling platforms. Notably, many FOSS have large user communities that build additional utilities and provide large scale technical support in online forums (e.g., Stack Overflow). For instance, Markdown is an incredibly versatile, free, and customizable text-to-HTML tool (i.e., a markup language) that allows users to construct models and document them in the same environment, as well as export to common formats like webpages, PDFs, or Microsoft Word documents. Similarly, Jupyter (and online JupyterHub) is a platform that acts as a text editor, compiler, and repository, allowing users to choose file formats depending on the coding language of choice (e.g. C++, Python, R). These open license tools provide a readily transferable mechanism for code sharing, with low barriers to entry with respect to software purchase or costly training programs.

In addition to the astute selection of tools for use during model development, the ecological model decision making process should address the issue of licensing and shareability. For instance, an ecological model could be licensed in a restrictive format to minimize potential adaptation or misuse due to a nuanced aspect of the code. Conversely, a model could be licensed minimally to protect the developer against liability. Table 3 provides an abbreviated overview of open licensing options, although developers should explore more extensive options as appropriate for their context.

Table 3. Overview of common code licenses as they pertain to ecological modeling.
Common Code Licenses General Description
Viral Rules out any claims of warranty
Copyleft When modifying and sharing work, developers must provide attribution. The most common viral Open Source licenses. Protects against liability. (GPL, AGPL, LGPL)
Permissive Not viral but still offers attribution and protection against liability. Possibility for more detailed patents and trademark rules. (MIT, BSD, Apache)
Creative Commons The “a la carte” of licensing. Can choose base license which rules out warranty and then attach modules as wanted such as: BY (requires the author to be stated), SA (any derivatives must be “shared alike”, i.e., virality), NC (commercial use forbidden), ND (derivatives are forbidden, i.e., not open source)
Public Domain/CC0/License Forfeit all copyright claims. No warranty waiver.

Code Sharing and Version Control

Thus far, open science tools have been discussed relative to model developers communicating with other audiences. However, model developers often need open science methods to communicate with each other during model development. Two major techniques that developers use in this regard are packaging of code into specialized libraries and version control software that facilitates development and archival.

Generally speaking, a package can be thought of as a fundamental unit of code that can include functions, data, documentation, and tests (Wickham and Bryan 2019). Packages provide an efficient, transportable, and reproducible mechanism for code sharing and publication. For instance, the R Statistical Software has over 19,000 user-developed packages with functionality ranging from statistical methods to river engineering tools. For many ecological models, code could either be developed as a package or for use as a function within an existing package from repositories.

Version control is crucial in any coding environment, especially for collaborative coding projects. When co-developing code, a myriad of issues may arise within the development period. For example, while two developers are working on two separate but related sections of code, each set of code will, with any edit, diverge. These two branches of code will need to be merged, which can take significant time to sort out “by hand”. In some cases, the new codes could conflict with each other, rendering one or both of each branch of code unusable. To aid in this process of synergizing parallel code development, online repositories facilitate version control capabilities such as branching and merging. GitHub is one of the most widely used all-purpose cloud-based repositories, which utilizes Git as the versioning language. GitHub can also be used as a public or private online repository strictly, without the versioning functions. Existing Git structures such as ERDC GitLabs, both private and public, provide an established alternative to model and code sharing within ERDC.

Table 4. Benefits of version control software (e.g., GitHub) for ecological model audiences.
Participant Functionality Facilitated by GitHub
Model Developers Store unpublished projects during development. Use GitHub as a platform with built-in functionality for user comments, forums, and wikis. Work synchronously via the GitHub remote repository. Version control using multiple access methods. Integrated licensing capabilities.
Model Users Easily download all files and compress file(s) into a directory. Download files from a public repository without creating an account. Access to documentation (user guides).
Technical Professionals View detailed updates of the remote repository to easily view changes. Access to documentation (technical documents and testing files).
Stakeholders Publish Wiki updates. View interactions/traffic on public repository.

Technology Transfer to Other Modeling Participants

As discussed, the availability and accessibility of ecological models are not synonymous. The methods described above can increase model availability, but often developers need to consider a different set of methods to make tools more accessible to diverse audiences. Web Applications can facilitate model use without handling code, and technical professionals and model users alike can save time and effort when accessing models through a graphical user interface (GUI). A variety of open source tools can help with GUI development. For example, the Shiny package in R provides a means for developing interactive web apps with little to no HTML experience. In addition to the models themselves, input and output data can be visualized in many ways to help “tell the story” of modeling (McKay 2019). Similarly, data can be processed into an ArcGIS storymap to convey key messages to broader audiences. These final technology transfer steps often have some of the largest, positive effect on model communication.

Summary

USACE ecological modeling best practices could be improved through expanded use of Open Science. The beneficiaries would include other USACE practitioners, external agencies and partners, and community stakeholders. The diverse array of Open Science tools and methods allows model developers to tailor approaches to their project needs. The trend of virtual work and collaboration has spurred the need for new, better organized methods of idea sharing and virtual project hygiene, making Open Science a topical subject. However, Open Science is not without its challenges, and time and resources may be required to learn new techniques or update to new platforms. Adopting the Open Science principles could significantly increase inter- and intra- agency communication, modeling capabilities and shared expertise. These cross-cutting efforts will ultimately result in a reduction of costs and an improvement in model effectiveness, as developers learn from each other. Open Science therefore provides an effective method to develop tools and platforms for use by USACE users and decision makers.

Acknowledgements

Reviews by Drs. Candice Hall and Todd Steissberg significantly improved a prior draft of these ideas. The use of products does not represent an endorsement of these products by either the authors or any other entity. Please cite this resource as:

Shaw C. and McKay S.K. 2024. A Guide to Applying Open Science Methods in Ecological Modeling. https://skmckay.github.io/OpenScience/.

References Cited

Grant W.E. and Swannack T.M. 2008. Ecological modeling: A common-sense approach to theory and practice. Malden, MA: Blackwell Publishing.

Fischenich, C. 2008. The application of conceptual models to ecosystem restoration. ERDC TN-EBA-TN-08-1. U.S. Army Engineer Research and Development Center, Vicksburg, MS.

Hampton, S.E., Anderson, S.S., Bagby, S.C., Gries, C., Han, X., Hart, E.M., Jones, M.B., Lenhardt, W.C., MacDonald, A., Michener, W.K. and Mudge, J., 2015. The Tao of open science for ecology. Ecosphere, 6(7), pp.1-13.

Herman BD, McKay SK, Altman S, Richards NS, Reif M, Piercy CD and Swannack TM (2019) Unpacking the Black Box: Demystifying Ecological Models Through Interactive Workshops and Hands-On Learning. Front. Environ. Sci. 7:122. Doi: 10.3389/fenvs.2019.00122

Langsdale, Stacy, Allyson Beall, Elizabeth Bourget, Erik Hagen, Scott Kudlas, Richard Palmer, Diane Tate, and William Werick, 2013. Collaborative Modeling for Decision Support in Water Resources: Principles and Best Practices. Journal of the American Water Resources Association (JAWRA) 49(3): 629-638. DOI: 10.1111/jawr.12065

McKay S.K. 2019. Visualization as a tool for ecological analysis. Encyclopedia of Ecology, 2nd Ed., Volume 2, pp. 213-220. Eds. S. Jorgensen and B. Fath. Doi:10.1016/B978-0-12-409548-9.10566-4.

McKay S.K., Richards N., and Swannack T. 2019. Aligning ecological model development with restoration project planning. ERDC EMRRP-SR-89. U.S. Army Engineer Research and Development Center, Vicksburg, MS.

McKay S.K., Richards N., and Swannack T.M. 2022. Ecological model development: Evaluation of system quality. ERDC/TN EMRRP-EBA-26. U.S. Army Engineer Research and Development Center, Vicksburg, MS.

Michener, W.K., 2015. Ecological data sharing. Ecological informatics, 29, pp.33-44.

Powers, S.M. and Hampton, S.E., 2019. Open science, reproducibility, and transparency in ecology. Ecological Applications, 29(1), p.e01822.

Swannack, T. M., J. C. Fischenich, and D. J. Tazik. 2012. Ecological modeling guide for ecosystem restoration and management. ERDC/EL TR-12-18. Vicksburg, MS: US Army Engineer Research and Development Center.

U.S. Army Corps of Engineers (USACE). 2011. Assuring quality of planning models. EC-1105-2-412. Washington, DC.

Van den Belt, M. (2004). Mediated Modeling; A System Dynamics Approach to Environmental Consensus Building. Washington DC: Island Press.

Voinov, A., and Bousquet, F. (2010). Modeling with stakeholders. Environmental Modeling and Software, 25, 1268–1281. Doi: 10.1016/j.envsoft.2010.03.007.

Wickham H. and Bryan J. 2019. R packages: Organize, test, document, and share your code. https://r-pkgs.org/.