The project aims to develop a guideline for the collection, cleaning, labelling, and curating of open data sets to help the industry test and train their models for a variety of AI applications.

Open, curated data sets can bring value to the industry. These data sets provide access to developers and researchers so that they have suitable data to test and train their models on a variety of applications. These data sets can also be used to benchmark various solutions and allow for effective and fair comparison, as well as allowing for research to be repeated and validated. 

The purpose of this project is to build open data sets specific to the mining sector for AI research and development by creating a suitable data set repository to allow for broad industry access and create a process for the data gathering process by:

  • Identifying existing data sets already released to the public.
  • Identifying the types of typical mining data that would be good candidates for building publicly available data sets for broader industry use.
  • Developing a set of principles and guidelines to underpin the collection including the cleaning, labelling, and curation of appropriate data.
  • The execution of several sub-projects to build datasets according to the guidelines and goals developed above.


  • A register of suitable candidate data sets.
  • A set of guidelines for the collection and curation of these data sets.
  • A set of repositories of gathered data


Join virtual workshops on:

  • May 21
  • June 10


Project team is building the project plan with its timeline, and requires just a minimum of 6 volunteers to redefine the way data is utilized in mining. To join the team, please contact any GMG staff