Minerals,like many other natural materials of geological origin(i.e.,geomaterials),face the challenge of name variations.This in turn hinders the data-intensive geoscience research,which often needs to integrate data ...Minerals,like many other natural materials of geological origin(i.e.,geomaterials),face the challenge of name variations.This in turn hinders the data-intensive geoscience research,which often needs to integrate data from multiple sources.It is clear that mineral name is not an appropriate identifier to connect records within and amongst data sources.The Mindat database,as one of the biggest resources for open data in mineralogy,has received significant volume of feedback on the heterogeneity of mineral and rock names.To address that issue,we established a persistent identifier service on Mindat to provide persistent and meaningful access to the records of geomaterials(mineral/rock/variety),localities,mineral occurrences,references,photos,and specimens.A key development was the long-form identifier,which adds contextual information such as identifier authorities and data types into the identifier structure.Moreover,a UUID service was built along with the long-form identifier to further increase the interoperability.The identifier service has been successfully implemented to mint millions of identifiers to different types of data objects on Mindat.Several use case scenarios were developed to illustrate the utility of the identifiers in the real world.We believe the persistent identifier will help address the challenges caused by name variations,and we welcome Mindat users to test the identifiers and send feedback to us for future extensions.展开更多
Data exploration,usually the first step in data analysis,is a useful method to tackle challenges caused by big geoscience data.It conducts quick analysis of data,investigates the patterns,and generates/refines researc...Data exploration,usually the first step in data analysis,is a useful method to tackle challenges caused by big geoscience data.It conducts quick analysis of data,investigates the patterns,and generates/refines research questions to guide advanced statistics and machine learning algorithms.The background of this work is the open mineral data provided by several sources,and the focus is different types of associations in mineral properties and occurrences.Researchers in mineralogy have been applying different techniques for exploring such associations.Although the explored associations can lead to new scientific insights that contribute to crystallography,mineralogy,and geochemistry,the exploration process is often daunting due to the wide range and complexity of factors involved.In this study,our purpose is implementing a visualization tool based on the adjacency matrix for a variety of datasets and testing its utility for quick exploration of association patterns in mineral data.Algorithms,software packages,and use cases have been developed to process a variety of mineral data.The results demonstrate the efficiency of adjacency matrix in real-world usage.All the developed works of this study are open source and open access.展开更多
基金funded by the U.S.National Science Foundation(Grant No.2126315).
文摘Minerals,like many other natural materials of geological origin(i.e.,geomaterials),face the challenge of name variations.This in turn hinders the data-intensive geoscience research,which often needs to integrate data from multiple sources.It is clear that mineral name is not an appropriate identifier to connect records within and amongst data sources.The Mindat database,as one of the biggest resources for open data in mineralogy,has received significant volume of feedback on the heterogeneity of mineral and rock names.To address that issue,we established a persistent identifier service on Mindat to provide persistent and meaningful access to the records of geomaterials(mineral/rock/variety),localities,mineral occurrences,references,photos,and specimens.A key development was the long-form identifier,which adds contextual information such as identifier authorities and data types into the identifier structure.Moreover,a UUID service was built along with the long-form identifier to further increase the interoperability.The identifier service has been successfully implemented to mint millions of identifiers to different types of data objects on Mindat.Several use case scenarios were developed to illustrate the utility of the identifiers in the real world.We believe the persistent identifier will help address the challenges caused by name variations,and we welcome Mindat users to test the identifiers and send feedback to us for future extensions.
基金supported by the U.S.National Science Foundation(Grant No.2126315).
文摘Data exploration,usually the first step in data analysis,is a useful method to tackle challenges caused by big geoscience data.It conducts quick analysis of data,investigates the patterns,and generates/refines research questions to guide advanced statistics and machine learning algorithms.The background of this work is the open mineral data provided by several sources,and the focus is different types of associations in mineral properties and occurrences.Researchers in mineralogy have been applying different techniques for exploring such associations.Although the explored associations can lead to new scientific insights that contribute to crystallography,mineralogy,and geochemistry,the exploration process is often daunting due to the wide range and complexity of factors involved.In this study,our purpose is implementing a visualization tool based on the adjacency matrix for a variety of datasets and testing its utility for quick exploration of association patterns in mineral data.Algorithms,software packages,and use cases have been developed to process a variety of mineral data.The results demonstrate the efficiency of adjacency matrix in real-world usage.All the developed works of this study are open source and open access.