The project aims to develop a guideline for the collection, cleaning, labelling, and curating of open data sets to help the industry test and train their models for a variety of AI applications.

Open, curated data sets can bring value to the industry. These data sets provide access to developers and researchers so that they have suitable data to test and train their models on a variety of applications. These data sets can also be used to benchmark various solutions and allow for effective and fair comparison, as well as allowing for research to be repeated and validated. 

The purpose of this project is to build open data sets specific to the mining sector for AI research and development by creating a suitable data set repository to allow for broad industry access and create a process for the data gathering process by:

  • Identifying existing data sets already released to the public.
  • Identifying the types of typical mining data that would be good candidates for building publicly available data sets for broader industry use.
  • Developing a set of principles and guidelines to underpin the collection including the cleaning, labelling, and curation of appropriate data.
  • The execution of several sub-projects to build datasets according to the guidelines and goals developed above.


  • A register of suitable candidate data sets.
  • A set of guidelines for the collection and curation of these data sets.
  • A set of repositories of gathered data


2021 Mar | Demonstrations

There are currently three ongoing demonstrations which will be used to gather feedback for testing the efficiency of the guideline. Two of these demonstrations are being conducted by academia members, while one is being conducted within a test mine.

2021 Feb | Project team call

Project team identified a need for demonstrations of the guideline before publication. Guideline has been relayed to various academic institutions asking them to demonstrate the efficiency of the guideline by developing open data sets based off practices within the guidelines.

2020 Jul | Project team call

Project team conducted a final revision of the draft before submitting to the GMG Technical Editor for preliminary editing.

2020 Jul | Project presentation at Artificial Intelligence Working Group virtual meeting

Project leaders presented the latest updates about the project, and conducted a mini-workshop during the call to have participants revise and comment the risk assessment. To access the material, please click here.

2020 Jun | Virtual Workshop

The participants worked directly in the document developing content, expanding on topics, providing feedback and sharing references.

2020 May | First project team call

Project leaders presented the outcomes from the workshops and shared the project plan with the volunteers. Subsequently, divided the guideline sections among the volunteers to begin content development. Click here to check out the outcomes.

2020 May | Virtual Workshop

The second workshop had a same dynamic as the first workshop to ensure participation across the globe. All input was analyzed by the Project leaders and the table of content was built. 

2020 Apr | Virtual Workshop

Workshop participates defined the key stakeholders to involve in the project, discussed the benefits of the project, divided the guideline into business and technical side and potential data sets. To access the outcomes, please click here. 

2020 Feb | Project launched at SME Annual Event

Project leader presented about Open Data and how it will enable innovation in the industry. To access the presentation, please click here