An Alan Turing Institute & University of Exeter Workshop

Date and time: March 5, 12, 19 & 26, 14:00–16:00 BST.
Location: Online (Zoom) due to the Covid-19 pandemic
Organisers: Hugh Williamson, Sabina Leonelli

Please find the programme and videos/slides of each presentation below! (click on presentation titles)

Funding kindly provided by the Alan Turing Institute (project From Local Fields to Global Indicators) and the University of Exeter. Administrative assistance from Egenis and the Institute for Data Science and AI.

Workshop brief

The development of reliable infrastructures for managing and linking plant data has become critical to international efforts to ensure global food security. Understanding and addressing the complex environmental and socioeconomic challenges of the twenty-first century, including the impact of climate change on agriculture and the persistent structures of poverty identified in the UN’s Sustainable Development Goals, requires integrating data of multiple types and from diverse sources and domains, ranging from basic plant science through crop field trials, socioeconomic studies and climate modelling. Much progress has been made on the development of tools and specifications for data sharing, standardisation and analysis, but the diversity, multiplicity and rapid development of these resources is creating technical challenges in linking them effectively and consistently. Alongside these technical challenges, data linkage poses political and social challenges that need to be addressed in order to ensure that data-centric solutions to food security are equitable, responsible and accountable, qualities that will be essential to their long-term resilience.

This workshop will examine the contemporary contours of such challenges through sustained engagement with current and historical initiatives and discussion of best practices and prospective future directions for ensuring responsible data linkage. To this aim, it will bring together representatives of key global initiatives for plant data with scholars in the history, philosophy and social studies of plant and agricultural science, thus combining technical expertise in data governance with an in-depth understanding of local situations of data use as well as their historical, social and scientific contexts and implications. This exchange of perspectives will provide a novel platform for addressing the technological and social implications of data linkage for food security together and in detail.

The workshop will be divided into four sessions, each with a distinct theme, held over the course of four weeks. Following the workshop, papers and commentaries will be developed and assembled into an edited collection in Open Access format, which will be published in summer 2020.

Programme

Introduction & Session 1: Experiences from The Trenches (March 5, 14:00-16:00 BST)

How is data managed in practice? To start the workshop, this session will discuss case studies of plant data use and linkage in the context of particular research projects and breeding programs, drawn from contemporary experience as well as historical research. Consideration of these cases will ground the thematic discussion of the following sessions, and provide an opportunity to reflect on the practical dimensions of the various challenges of data linkage and their solutions. This session will also begin with a general introduction to the online workshop goals and format by the organisers.

14:00

Introduction by organisers

14:10

Between Subsistence and Agronomy: Carl Linnaeus (1707-1778) on Famine Foods
Staffan Müller-Wille (University of Cambridge)

Having witnessed a catastrophic famine in his native province Småland in 1725, food, and especially famine foods – or what to eat when nothing is left to eat – were always on the mind of the Swedish naturalist Carl Linnaeus (1707–1778). In my contribution, I will focus on information Linnaeus collected during his Laplandic Journey (1732) about food sources used by settler and reindeer-herding communities in the very North of Sweden. Following the trajectories of this information in later works of Linnaeus, I will show how he possessed a keen eye for the way in which sustainable subsistence practices problematize accepted definitions of food and blur dividing lines between the wild and the cultivated, foraging and agriculture, and poverty and wealth. At the same time, Linnaeus propagated the idea of the North as a barren wilderness that needed to be “cultivated,” resulting in the displacement of extant livelihoods by an extractive plantation economy. This tension, I will argue, was intrinsic to Linnaeus’s taxonomic enterprise, which was infused by a logic of re-placement that continues to inform current efforts to attain food security.

14:35

Managing Data in Crop Breeding: A Hundred Year Challenge
Richard Harrison (NIAB) & Mario Caccamo (NIAB)

The rediscovery of Mendelian genetics at the dawn of the 20th century ushered in a revolution in agriculture. For the first time, varieties with known performance characteristics were systematically developed, based upon the principles of heredity and the genetic control of traits. This inevitably led to questions over the uniformity, distinctiveness and stability of distributed genetic material throughout the supply chain and led to the development and implementation of data standards for measurement of key traits of agronomic importance. A prime example of this is the co-development of certification standards by NIAB in partnership with the Plant Breeding Institute (PBI) and the seed industry in the early 20th Century. These systems have ultimately led to our modern-day varietal testing and certification systems and meant that records of the output much of the breeding progress of the past 100 years were kept in a reliable and robust system.

Within many breeding programmes, records stretch back over 100 years and for many, the early years of publicly funded breeding programmes data are published in annual reports and are in the public domain. From the 1980’s onwards, as public breeding programmes (in many European countries) have traversed into the private sector, many proprietary datasets are no longer public, apart from at the point of release, where national and recommended listing systems exist, which serve not only to evaluate the relative performance of varieties, but provide key trait information. As there is no longer an imperative to release all data into the public domain, in some cases this has led to a relaxation in data capture standards, coupled with the proliferation of digital standards over the past thirty years leading to issues around data longevity. Yet, maintenance, curation and linkage of historical data can prove valuable, as we will demonstrate through examples, utilising examples drawn from across NIAB.

With the advent of multi-omic data and the proliferation of data types used within modern breeding programmes, the enormity of managing and maintaining gold-standard record keeping has never been harder. New skills sets and infrastructure are needed, that are often siloed within sectors. Moreover, the perception of the febrile nature of cloud-based or open-source platforms has led to inertia over their adoption. The risk of valuable data loss is now greater than ever before reducing the options for future exploitation.

We speculate as to what approaches may be best to ensure that within breeding programmes, data is captured and archived in such a way that it may have longevity and what role expanded or aligned programmes of data collection at the point of varietal registration and varietal evaluation could have in ensuring the best use of public data to address some of the challenges agriculture faces over the next 100 years, namely the transformation to sustainable, low-emissions and biodiversity-promoting farming, all of which can be addressed in part by breeding and require data-driven solutions.

15:00

Data, Duplication, and the Decentralisation of Crop Collections
Helen Anne Curry (University of Cambridge)

In the 1970s, the number of accessions held in national and international collections of crop germplasm increased steadily. Concerns about rapid 'genetic erosion' arising in the wake of the Green Revolution prompted efforts to forestall such erosion by assembling or augmenting collections of landraces and crop wild relatives. By the 1980s, this growth, initially a source of pride, was increasingly recognized as a liability. Too many accessions lacked the basic information necessary for researchers to make requests of gene bank managers, let alone put samples to work knowledgably in breeding programmes. Many gene banks came under scrutiny for poor management practices, and several prominent banks found themselves accused of mishandling a 'global patrimony' entrusted to them by the international community. In this paper, I explore a response to these failings, real and perceived, that attracted attention from many in the germplasm conservation community: creating linked, standardised databases of collections. Calls for more thorough and consistent data about accessions often emphasised, and still emphasise today, that these data will make collections easier to navigate and therefore more valued and more used. Here I take a close look at the use of data collation and standardisation as a means of 'rationalising' collections, a motivation that has not been advertised as prominently. For some researchers and collection managers, the identification of duplicates would allow the channelling of limited time and money to only the most unique accessions, even creating the possibility of de-accessioning items known to be held elsewhere. As I show, the vision of achieving efficiencies through close collaboration depended not only on overcoming technical hurdles in data management but also on social and political alliances. Efforts to identify and weed out duplicates in the interest of stretching gene bank resources appear to have been pursued most vigorously by communities of researchers whose boundaries were delineated by European Union membership and who were already connected by their expertise in particular crop species.

15:25

Data Management in a Multi-Disciplinary African RTB Crop Breeding Program
Afolabi Agbona (IITA), Prasad Peteti (IITA), Elizabeth Parkes (IITA), Ismail Rabbi (IITA), Lukas A. Mueller (Boyce Thompson Institute), Chiedozie Egesi (IITA) & Peter Kulakow (IITA)

Quality phenotype and genotype data is important for the success of a breeding program. Like most programs, African breeding programs generate large multi-disciplinary phenotypic and genotypic datasets from many locations that must be carefully managed through the use of an appropriate database management system (DBMS) in other to generate reliable and accurate information for decision making. A DBMS is essential for data collection, storage, retrieval, validation, curation and analysis in plant breeding programs to enhance the ultimate goal of increasing genetic gain. The International Institute of Tropical Agriculture (IITA), working on the root, tuber and banana (RTB) crops like cassava (https://cassavabase.org/), yam (https://yambase.org/), banana and plantain (https://musabase.org/) has deployed the use of a FAIR-compliant (Findable, Accessible, Interoperable, Reusable) web-based database; BREEDBASE (https://breedbase.org). The functionalities of these databases in data management and data analysis have been instrumental in achieving breeding goals. Such capabilities include ontology driven data management (https://www.cropontology.org/), statistical analyses, interfaces with Breeding API (BrAPI), barcode-based data collection using the PhenoApps (http://phenoapps.org/). User-friendly PhenoApp examples include Fieldbook for phenotype data collection, Coordinate for genotype tissue sample collection and tracking, and Inventory for weighing samples without the need for data transcription. Standard Operating Procedures (SOP) for each breeding process have been developed to allow a cognitive walkthrough for the users. This has further helped to increase the usage and enhance the acceptability of the system. The wide acceptability gained among breeders in the global RTB research programs have resulted in improvements in precision and quality of genotyping and phenotyping data, and has resulted in improved progress to reach breeding program goals.

15:50

Final discussion and wrap-up

Session 2: Technical Challenges of Data Linkage (March 12, 14:00-16:00 BST)

Making plant data FAIR (Findable, Accessible, Interoperable, Reusable) has been the subject of much effort. Extensive semantic tools are now available, including the multiple, intersecting ontologies that comprise the Planteome project, as are metadata standards such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE). Such tools nevertheless require collective work to develop and maintain. Beyond ensuring data themselves are FAIR, actively linking and circulating data poses further challenges. These include finding ways to link biologically, experimentally or geographically related yet heterogeneous datasets consistently, and to make data usable in practice to potential users with divergent aims and resources, not only reusable in theory. This session will address the technical challenges of data linkage, including the development of standards and infrastructures; epistemic issues; and the organizational requirements of this work.

14:00

Introduction by organisers

14:05

Linking Legacies: Realising the Potential of Long-Term Agricultural Experiments
Richard Ostler (Rothamsted Research)

Long-Term agricultural Experiments are vital resources for assessing the sustainability of food production and soil health. For researchers to effectively use a long-term experiment it is essential to have access to relevant historical data and necessary metadata. In turn, new datasets generated from investigations using an LTE should be resolvable back to the source LTE as part of that experiment’s continuing narrative. Further value from LTEs can be derived if experiments sharing common characteristics, such as cropping system, treatment, management or environment, can be identified and their datasets integrated.

LTEs can generate very diverse data types, from annually collected yield traits, periodic and ad hoc surveys to continuous sensor data. To be usefully findable, interoperable and re-usable LTE datasets not only need to be described using community accepted semantic and metadata standards but require knowledge both of how they relate to each other, in time, space and scale, and when they do not. Within a single experimental system an LTE can therefore encapsulate key challenges facing plant data linkage and these challenges are only amplified when attempting to link data across LTEs.

This presentation reviews the approach being taken at Rothamsted Research to apply FAIR data principles to its long-term datasets, and how Rothamsted is working with the wider agricultural data and long-term experiments communities to address some of the technical and cultural challenges faced.

14:30

Challenges to Data Linkage in Plants: Two Parables from the Pea
Gregory Radick (University of Leeds)

This chapter will draw upon the history of scientific studies of inheritance in Mendel's best-remembered model organism, the garden pea, as a source of two parables -- one pessimistic, the other optimistic -- on the challenges of data linkage in plants. The moral of the pessimistic parable, from the era of the biometrician-Mendelian controversy, is that the problem of theory-ladenness in data sets can be a major stumbling block to making new uses of old data. The moral of the optimistic parable, from the long-run history of studies at the John Innes Centre of aberrant or "rogue" pea varieties, is that an excellent guarantor of the continued value of old data sets is the preservation of the relevant physical materials -- in the first instance, the plant seeds.

14:55

From Farm to FAIR: The Trials of Linking and Sharing Wheat Research Data
Chris Rawlings (Rothamsted Research), Robert P. Davey (Earlham Institute & the Designing Future Wheat Data Coordination Task Force)

This paper describes progress towards an integrated data framework that supports the sharing of data from the Designing Future Wheat (DFW) strategic research programme funded by the UK BBSRC. DFW is a 5 year project (https://designingfuturewheat.org.uk/) that spans eight research institutes and universities, and aims to develop new wheat varieties (germplasm) containing the next generation of key traits. Much of the research in DFW is contributing new wheat germplasm that are assessed in large scale field trials at partner sites and also by a precompetitive consortium of wheat breeding companies. To complement the field trials, a large number of DFW research studies are collecting additional data which gives more detailed understanding of the trait of interest. Many of these projects make extensive use of genetic and genomic datasets that are being developed in DFW or are available through other national and international collaborations. The application of novel field-scale image-based phenotyping platforms are also being employed which present new challenges through the volume of data they generate and complexity of the analysis methods used to extract phenotypic data.

DFW is committed to making our data open to the wider research community by adopting FAIR data sharing approaches. It is also a good example of a data-intensive strategic research programme which follows a Field-to-Lab-to-Field approach that is representative of much contemporary and multidisciplinary crop science research. However, even with dedicated funding to develop crop data research infrastructures within DFW, we found that there are many challenges that require pragmatic and flexible ways to enable them to interoperate. We present key DFW data resources as a case study to assess progress and discuss these challenges with a view to developing infrastructure that exposes metadata-rich datasets and that meets FAIR principles. We describe our approaches to: (1) reporting internally and to sponsors about research outputs (2) federating institutional data and information resources (3) improving and standardising the collected data and metadata for improved FAIRification (4) methods in development to expose data for reuse through the adoption and implementation of community standards across multiple data layers.

15:20

Plant Scientific Data Integration, From Building Community Standards to Defining a Consistent Data Lifecycle
Cyril Pommier (INRAE)

Applying the FAIR principles to plant research data drew partially on its use within other life science domains, especially for genomic data. But plant particularities, especially when dealing with the plant environment interaction such as phenotypes, needed some specialized answers. The plant communities from major global players, such as ELIXIR, EMPHASIS and the CGIAR, have therefore joined forces to build an ecosystems of data standards with the Minimal Information About a Plant Phenotyping Experiment (MIAPPE) to handle the general data and metadata organization, the Multicrop Passport Descriptors (MCPD) for the identification of the plant genetic resources and the Crop Ontology (http://www.cropontology.org/) for the documentation of the measurement methodology. The organization of the researcher communities and the collaborative methodologies allowed to dramatically improve the usability of MIAPPE and its adoption. From that first success, the Elixir Plant Community described a general data lifecycle to identify the gaps and the needed developments. As a consequence, several actions have been identified, like in particular providing tools to address the “first mile” of data publishing, i.e. the gathering and documentation of data, including automated metadata capture. The current paper will therefore describe some of the existing tools, as well as their adoption of plant standards. Finally, we will also describe how different standards tend to converge to address common needs.

15:45

Final discussion and wrap-up

Session 3: Governance Challenges of Data Linkage (March 19, 14:00-16:00 BST)

New flows and intersections of big data from -omics research in plant science, including field-based phenomics as well as genomics, to various types of socioeconomic and environmental data, pose distinct challenges for governance. Data access and ownership for the common good and/or scientific advancement remain areas of considerable contestation, especially given the distinctive intellectual property landscape of plant science, which is marked by the predominance of transnational corporations on the one hand and regimes of national sovereignty on the other. Moreover, longstanding challenges of implementing Access and Benefit Sharing (ABS) schemes in regard to biological materials are renewed by the increasing availability of digital data, while the integration of biological with socioeconomic data raises new questions of privacy. This session will address these and other governmental issues raised by plant data linkage, from open science policy through legal and political regulation.

14:00

Introduction by organisers

14:05

Spinning the Agricultural Data Web
Medha Devare (IFPRI), Elizabeth Arnaud (Alliance of Bioversity International and CIAT) & Brian King (CGIAR Big Data Platform)

When the COVID-19 crisis subsides, it may well be seen as a triumph of agile, global, collaborative science. The rapid mobilization of governments, non-profits, research organizations and industry to align research and development efforts for therapeutics, vaccines, and diagnostics is unprecedented. This is made possible by data sharing, such as the COVID-19 Open Research Dataset, and well-developed biomedical ontologies. Data standards and data sharing opened the way for applying massive computing power to model and identify over 70 promising compounds for treatment in just under two days—a result that could have taken years in the laboratory. This is a shining example of in silico analyses over vast data pools enhancing the speed and scale of scientific innovation. Enabling similar data-driven, agile responses to agricultural crises and disruptions hinges around enabling discovery of and access to publications, data, and data products that are interpretable and interoperable – for humans and machines. However, an entrenched research culture, and issues related to the governance, creation and maintenance of the standards required to enable such interoperability and ease of reuse continue to be roadblocks in the agricultural research for development sector. Effective operationalization of the FAIR Data Principles towards Findable, Accessible, Interoperable, and Reusable data requires that agricultural researchers accept that their responsibilities in a digital age do not end with data collection and manuscript publishing. Rather, they extend to the stewardship of data assets to ensure long-term preservation, wide access and reuse. The development and adoption of common standards (including metadata schemas, ontologies and controlled vocabularies) relevant to the agricultural space are key to assuring good stewardship, but these efforts face several challenges. On the researcher side in particular they include limited awareness about mining and deriving value from standards-compliant, interoperable data pools; lagging data science capacity for in silico analyses, with a related emphasis on the collection, rather than reuse of existing data; and limited fund allocation towards best practices in managing project data in the short term, and maintaining standards in the long term. Other, more community-based hurdles concern the collaborative development and governance of standards, and the building of critical mass in relation to their adoption. Findable, Accessible, Interoperable, and Reusable (FAIR) data assets are key building blocks for an evidence-driven, collaborative approach to enhancing the impact of research and development in the agricultural domain. This paper discusses the challenges and possible solutions to making FAIR agricultural data assets the norm rather than the exception as a means to catalyze a much-needed “translational agriculture” revolution.

14:30

Creating a Digital Marketplace for Agrobiodiversity and Plant Genetic Sequence Data: Legal and Ethical Considerations of an AI and Blackchain Based Solution
Mrinalini Kochupillai (Technical University, Munich)

The leeway to operate with plant genetic resources (PGRs) pre-supposes the existence of optimal demand as well as supply of agrobiodiversity containing these PGRs. Yet, according to estimates, about 75% of crop (on-soil) genetic diversity has been lost with farmers abandoning locally adapted heterogenous seeds for genetically uniform varieties for genetically uniform high yielding ones. Associated adoption of chemical intensive farming also leads to loss of in-soil, beneficial microbial diversity. These together, have a negative impact on supply of agrobiodiversity and its beneficial components. At the other end of the spectrum, regulatory hurdles under well-intended laws create bureaucratic hurdles that disincentivize use of agrobiodiversity in research and breeding programs, creating a lack of demand for agrobiodiversity. As a combined result, active and robust marketplaces for agrobiodiversity, including for derivatives such as plant genetic sequence data, have failed to evolve.

This paper argues that these trends result from narrow scopes and directions of scientific research on the one hand, and from uni-directional data/knowledge flows, exclusively from the formal sector (academic, industry) to the informal sector (farmers), on the other. The paper further argues that with the rapid evolution of blockchain/DLTs and AI/Machine Learning platforms, the direction of scientific research as well as of data/knowledge flows in the agricultural sector can and should be diversified. However, with the possibility of bi-directional (formal to informal sector as well as informal to formal sector) data/knowledge flows, also comes the need for bi-directional value and benefit flows. Here, again, blockchain based platforms can support: (i) secure and “controllable” data/knowledge sharing by the informal sector; (ii) accrual of fair, inclusive and equitable economic benefits for those sharing data, and (iv) traceability, for ensuring accurate economic benefit sharing on the one hand and determining legal liability on the other, on a case by case basis. The article outlines a hypothetical data/knowledge and value flow for a possible AI based app designed to support the diversification of research goals as well as data/knowledge flows. The article and outline hope to provide food for thought for additional multi-disciplinary and multi-stakeholder research, potentially leading to the development of novel regulatory as well as ethical business models for the creation of robust digital marketplaces for agrobiodiversity for the benefit of farmers, researchers and the environment.

14:55

Digital Sequence Information and Genetic Resources: Global Policy Meets Interoperability
Elizabeth Arnaud (Alliance of Bioversity International and CIAT), Brian King (CGIAR Big Data Platform), Daniele Manzella (UN Food and Agriculture Organisation) & Marco Marsella (UN Food and Agriculture Organisation)

Access and Benefit-Sharing (ABS) is a construct of international agreements on genetic resources, aimed at exploiting, through access, the potential of those resources for various public policy objectives, e.g. nature conservation, food security, and rewarding, through the fair and equitable sharing of the benefits of utilization, those who have traditionally maintained the diverse genetic base. In various ABS fora, including the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA) at the Food and Agriculture Organization of the United Nations, discussions are taking place as to whether to regulate genetic sequence data (or Digital Sequence Information, DSI) within the remit of the agreements in order to actualize the provisions of the agreements in the light of scientific and technological advancements. Part of those discussions revolves around the scope of DSI and suitable terminology to reflect such scope into legal agreements.

In the on-going consideration of DSI, multiple data sets are grouped under DSI, e.g. DNA, RNA, non-coding sequence information. These datasets are stored and accessed in databases and, for the macro-level decisions on the scope of DSI to be channeled to actual producers and users of DSI, harmonization with the database architecture is highly desirable. A common understanding of DSI can become conducive to improved data aggregation and interoperability, which are pre-requisites for successful research and innovation based on genetic resources.

This is an area where the ITPGRFA has already deployed practical technological tools to associate information and knowledge to PGRFA samples to facilitate research and breeding. Among the different Permanent Unique Identifier (PUID) technologies that the ITPGRFA considered, Digital Object Identifiers (DOIs) emerged as a very powerful mechanism to establish linkages to all sorts of information. DOIs are a well-established standard originally developed for the publication sector that has recently expanded its reach to many other application fields such as physical objects, software packages, database queries, laboratory instruments and their configurations, works of art and more.

Among their many qualities, DOIs natively support the establishment of relationships to other PUIDs, preferably, but not only, DOIs. Recently, this possibility has become the focus of diverse initiatives: some related to data citation (i.e. the provision of accurate, consistent and standardized referencing for datasets just as bibliographic citations are provided for other published sources like research articles or monograph), others aiming at building an infrastructure for persistent identifiers as a core component of open science (e.g. the FREYA project funded by the European Union).

Banking on these experiences, unique identifiers/DOIs could be applied to the more or less wide range of “component parts” that are all suggested as being part of DSI, depending on the definition adopted, and the relationships between those component parts could be described at some useful level using ontologies and reflected in metadata, so to improve the information discovery and insight about the genetic resources themselves.

While incredibly promising, the DOI mechanism suffers from a variety of issues that need to be addressed and resolved to allow it to deliver. First of all, awareness must be raised about this opportunity. Scholars and authors need to be aware that there is a simple yet effective way of increasing the visibility of their work as well as make it more useful. On the other hand, publisher systems and dataset repositories need to be upgraded to properly handle the establishment of these relationships.

Our paper introduces these concepts, explores the potential of DOIs for connecting the policy process on DSI with data interoperability, and presents the issues that are hampering the widespread adoption of PUID relationships with a view to suggesting potential solutions.

15:20

Collaboration in Crop Diversity Management: A Pragmatist Approach to Data Sharing
Selim Louafi (CIRAD), Mathieu Thomas, Frédérique Jankowski, Christian Leclerc, Alexandre Guichardaz, Morgane Leclercq, Servane Baufumé, Vanesse Labeyrie & Adeline Barnaud

While the FAIR principles provide a powerful tool for facilitating data sharing and enhancing data stewardship in plant science, their data-centric approach may not be easily implemented in complex collaborative contexts, in which heterogeneous actors are gathered across disciplines, sectors, locations and scales and where data is only one dimension of the collaboration. Their concrete implementation in such context raises important cognitive, social, political and organizational challenges. Building on the experience of a transdisciplinary project on the diversity of crop diversity management systems in West Africa, this contribution addresses these various challenges from a pragmatist perspective that moves away from the question of knowledge and representation of fixed pre-existing objects to the way data are enacted in practice. Such approach recognizes that the production, appropriation and use of knowledge are fully intertwined in any collaborative context and ultimately impacts data design choices, data sharing, and participants’ rights.

15:45

Final discussion and wrap-up

Session 4 & Conclusion: Social challenges of data linkage (March 26, 14:00-16:15 BST)

The social implications of plant and agricultural biotechnologies have been the focus of much debate in recent decades. Data production, sharing and linkage raise new issues concerning the inclusion of diverse stakeholders and ensuring that data works for them, practically and equitably. Building plural knowledges into plant data infrastructures, through the inclusion of practical and traditional knowledge from farmers and breeders, the recognition of diverse (e.g. gendered, but also professional) expertise and the implementation of multilingual systems, will be an important facet in establishing the relevance of those infrastructures to a wide range of stakeholders. Ensuring that global circulations of plant data are fair as well as FAIR, moreover, requires sustained attention to the distribution of scientific and computing resources that facilitate access to and effective use of data resources. Throughout all of this, ensuring that key subjects of food security and end-users of data products, not least farmers and especially smallholder farmers, are given adequate representation and consideration in the development of data infrastructures is a necessity. This session will reflect on a range of these social challenges, ongoing attempts to address them and potential solutions.

14:00

Introduction by organisers

14:05

Baladi Seeds in the oPt: Populations as Objects of Preservation and Units of Analysis
Courtney Fullilove (Wesleyan University) & Abdallah Alimari (National Agricultural Research Centre, Jenin, West Bank, Palestine; ICARDA)

This paper seeks to understand how participatory plant breeding initiatives, heritage narratives, and international agricultural research motivated by climate change each fix the population as a target of research and development. It further considers the implications of such a focus for organizing the preservation and production of seeds at community, regional, national, and international levels. Drawing on on quantitative research on farmer participation in informal seed production for wheat in Palestine, site study of a Palestinian NGO administered seed bank in Hebron, and oral histories of farmers in the West Bank, this paper analyzes the relation between community seed banks and national/international agricultural research infrastructures of data and collection and considers the extent to which farmer knowledge can be represented in formalized preservation projects. For even as international agricultural researchers have endeavored to include farmer knowledge in data infrastructures and plant breeding projects, agrarian knowledge remains the source and the target of their innovations. Preservation in the West Bank takes shape against the backdrop of Israeli occupation, which hobbles commercial agricultural development and intensifies dependence on Israeli imports of seeds and finished agricultural products. Informal seed production offers a strategy to reduce dependence on Israeli imports, shore up land claims, and resist the archiving of Palestinian flora within an Israeli national project. Projects cross formal and informal domains: CGIAR-funded agricultural research promoted by the Ministry of Agriculture, Palestinian NGO-directed community seed banks supported by international aid, and volunteer-based community organizations oriented toward Palestinian heritage and sovereignty. In each domain, collectors seek local varieties, drawing on the knowledge of local farmers to identify baladi seeds (literally “my country,” and connoting local and traditional production). In a biological context, “baladi” refers generally to a population comprised of numerous heterogeneous lines with their own individual characteristics: resistance to drought, pests, and rusts, as well as traits related to texture, taste, and yield. Even as it shelters generalizes enormous diversity, the population remains the the object of preservation and the principal unit of analysis. Collectors render baladi populations legible for archiving through morphological analysis, physical multiplication, and multiple documentation processes. These overlapping documentary practices, and their viability as representations of agrarian knowledge, are the subject of this paper.

14:30

The Research Data Alliance Interest Group on Agricultural Data: Supporting a Global Community of Practice
Patricia Rocha Bello Bertin (Embrapa), Cynthia Parr (USDA-ARS) & Debora Drucker (Embrapa)

Efforts to address equity and inclusion in agricultural data infrastructures face numerous challenges. People and networks are widely distributed geographically. This means some solutions to data problems may arise regionally and independently, yet many people are not easily able to engage with their distant colleagues to learn about them or collaborate. In general constraints on funding for such projects are often national rather than international, and travel funding is not equally distributed. Finally, the breadth of activity means interdisciplinary communication is important but difficult and hard to sustain. Addressing these challenges, the Research Data Alliance (RDA) has been a home for the Interest Group on Agricultural Data (IGAD) since 2013. The convening power of RDA provides many advantages, such as the ability to sustain multiple threads of interdisciplinary work, and worldwide networking. IGAD regularly convenes some meetings outside the RDA Plenaries to allow for participation from practitioners with fewer resources. Several important working groups have been supported by IGAD such an emerging crop data interoperability working group. FAIR data (Findable, Accessible, Interoperable, and Reusable) has been a frequent topic of discussion. In recent years, virtual sessions have expanded the conversations even more to enable global participation. IGAD will become the first example of a new type of RDA group – a community of practice. A future goal is to use this community of practice to put good regional or national work into practice via inclusive collaborations. For example, in the US several workshops have addressed the need for progress on issues relating to farmer data ownership and privacy; these are informed by work happening in Europe but ideas will need to be regrounded and modified to cultural and legal practices elsewhere. For plant data in particular, ideas about land races and nomenclature from the Taxonomic Databases Working Group could be combined with the work of the CGIAR institutes to provide more seamless access to indigenous knowledge. In Brazil, several efforts to support data driven decision-making in the field could serve as models for other IGAD members. For instance, the Brazilian Agricultural Research Corporation (Embrapa) has implemented data services through APIs that provide real-time data on climate, productivity and most favourable days for planting different crops. Diverse agrifood products traditionally grown by local populations are also getting more emphasis in Brazil and agrobiodiversity data standards are being improved by collaborative work from several organizations. Collaboration is also the motivation behind the creation of a national GO-FAIR implementation network focused on agriculture in Brazil. All of this work will benefit if the IGAD community of practice can include new voices from the fields.

14:55

Ethical and Legal Considerations in Smart Farming: A Farmer’s Perspective
Foteini Zampati (GODAN; KTBL), Eliane Ubalijoro (GODAN) & Suchith Anand (GODAN)

Lack of transparency around issues of data ownership, better control of access to and use of data, data rights, privacy, security and whether farm data should be considered ‘personal’ or not, are some of the data challenges faced by all agricultural stakeholders, particularly farmers. Moreover, data transactions are currently governed by contracts and licensing agreements, in which the terms and agreements are complex. This leaves smallholder farmers with very little negotiating power and it is obvious that a lack of trust dominates these relationships.

Until today, ethical considerations were often side-lined because gathering more data was seen as necessary, and concerns about how data might be abused or misused were only subsequently considered. However, with the increase of big data in smart farming, it is more essential than ever to focus on the ethical aspects of data governance (access, control, consent) and practices. This will provide valuable insights into how data is being collected and used, and for what purposes, how to bridge the digital divide, and how to create transparency in order to build trust between stakeholders.

15.20

Responsibility Beyond Ethics and Infrastructures: Conceptual and Normative Considerations for Plant Data Linkage and Agriculture
Hugh Williamson & Sabina Leonelli

As the contributions to this volume demonstrate, there have been significant advances in the development of infrastructures and standards for data linkage in the plant and agricultural sciences. Alongside this, there is increasing recognition of ethical issues in data sharing (including surveillance and privacy, the rights of communities and other stakeholders, benefit sharing mechanisms, and so on) and development of procedures and tools to address these. These efforts are valuable and necessary, yet the scope of responsibility in plant data linkage for food and agricultural purposes is wide, given the highly diverse range of stakeholders who are dependent on flows of plant and agricultural data and the massive scale yet great local variability of global challenges such as food security and climate change. In light of this heterogeneity and the challenges it poses to plant and agricultural science for the common good, we contend that infrastructural and ethical advances in data linkage need to be complemented by attention to the conceptual and normative underpinnings of plant breeding and agriculture that structure—and constrain—the uses of plant data, their paths of travel, and the participants in data collection, circulation and use. We illustrate this in reference to current efforts to accelerate rates of genetic gain in plant breeding and the corresponding reorganisation of international plant breeding networks and seed systems.

15:45

Concluding discussion