GUIDELINE FOR SHARING OPEN DATA SETS IN MINING

Guideline For Sharing Open Data Sets

Published: 2022-04-21
Working Group: Artificial Intelligence
Status: Current

ABSTRACT

The purpose of this guideline is to provide mining industry stakeholders with best practices for data sharing so that they can benefit from the opportunities that open data can offer. It leverages and references existing work on data sharing and provides additional context for mining settings. This guideline is directed towards readers who intend to share data with others, those involved in the approvals process, and users who want to use open data shared by the mining industry. The guideline covers key management and implementation considerations.

Management Considerations

  • Licenses: A data license is typically used before sharing and publishing data to outline the data providers’ intended use while giving them protection. They also provide clarity to the data consumer, preventing them from potentially infringing the rights of the owners. License types can typically be divided into open (without technical or legal restrictions), non-commercial, partially open or restricted usage, and closed. Existing frameworks can be used to cover general requirements.
  • Benefits: Sharing data provides benefits, which include supporting innovation and research and allowing the public access to information to help improve decision-making in operations.
  • Challenges: Challenges related to cost, legal issues, storage, privacy, and common language associated with the collection, administration, internal communication, and maintenance of open data should be addressed.

Implementation Considerations

  • Sharing: It is critical to identify what data should and should not be shared prior to implementation. The data that is shared should be well-documented, reliable, usable, accurate, relevant, and in an accessible format. Sharing any data that contains sensitive information should be avoided unless the risks can be acceptably mitigated (e.g., through anonymization).
  • Process for making data open: When making a data set open, it should be submitted in a machine-readable format that is open and logical. If possible, any community consensus on the format or formats of existing data should be prioritized. It is also important to identify the appropriate anonymization requirements and techniques.
  • Approval: It is recommended that a formal approval process is adopted when releasing data. The documentation provided for approval to release data typically includes an overview of the original data and its structure, a description of anonymization procedures, an overview of the resulting data, and attestation or “sign-off” from key stakeholders that the data set is acceptable to share.

CONTINUE COLLABORATING

Success story? Input on how to improve this guideline? Let us know.

To share your experience using the guideline, please fill out this case study form. 

For more general feedback, please fill out the form below:

    SUPPLEMENTARY MATERIALS

    Landscape of Open Data Sets and Platforms

    Examples of organizations with open data platforms and specific data sets that are relevant to mining.

    Executive Summary

    Quick overview of the guideline.

    Parking Lot

    A guideline parking lot is the compilation of topics and improvements that are identified for future versions or future projects throughout the development and review of a guideline.

    Guideline Poster

    Poster summarizing the guideline.

    TABLES

    Table 3. Anonymization Considerations
    Table 3. Anonymization Considerations
    Table 4. General Considerations for the Risk Assessment
    Table 4. General Considerations for the Risk Assessment
    Table 5. Example Structure for Approval Document
    Table 5. Example Structure for Approval Document

    NEWS RELEASE

    RELATED WORK

    X