Machine learning training set for virtual microscopyUrheberrecht: CGRE
Geological institutes at universities, geological surveys and resource companies sit on a treasure of geoscientific data that remains, to date, largely untapped: mineral rock thin sections. Minerals in these thin sections, typically with a thickness of 25 μm, are predominantly translucent. However, when investigated between two crossed and rotating polarizing filters, the physical process of birefringence will result in mineral interference colors and extension angles, which are directly related to the crystal structure, and therefore indicative of the mineral type. In addition, the crystal- and pore shape and orientation provide an important indication of the rock formation and deformation processes. Thin sections have therefore long been an integral part of geoscientific studies, and the analysis of thin sections is an elementary aspect of geoscientific curricula around the world.
In the conventional process to date, these thin sections are often created and interpreted for a specific purpose, and subsequently stored. However, even though these sections are now routinely scanned and provided in annotated databases, the realization of the full scientific potential of this vast digital data set is yet in its infancy.
Driven by the successful applications of machine learning in the field of visual image analysis and segmentation, several research groups [e.g. 1,2]and companies (e.g. Zeiss), started testing different machine learning algorithms to train classifiers for a faster and more consistent analysis of thin sections. However, these approaches are mostly limited to pixel-based identifiers (typically using color, extinction angle, and possibly texture filters) and often only applied to mineral segmentation in single sections. First groups started experimenting with Convolutional Neural Networks (CNN’s), but these studies are yet only applied to small data sets and simple considerations [3,4].
We envisage a visual analysis and object identification system, for example based on CNN’s, that will, in the future, be routinely applied to databases of digital thin sections and able to identify scientifically interesting features and provide new insights, similar to recent successes in machine learning for object recognition and semantic segmentation in visual scenes. This system will enable the systematic and quantitative analysis of thin section samples providing unprecedented insights into the microscopical processes and mineralogical variations on micro- to km scale based on the combined quantitative analysis of 10’s to 100’s of thin sections - an endeavour that is close to impossible by human visual inspection only, due to the large amount of information in even a single thin section. Scientifically interesting examples are, for instance, strain localization in shear zones, diagenetic overprinting of reservoir rocks, and a systematic analysis of zoning of metamorphosis, with several important implications for society and relevance to CO2-storage, long-term nuclear repositories, and an improved understanding of earthquakes.
The development of an intelligent analysis and identification system that enables these types of investigations is a significant effort and subject of a larger follow-up proposal (e.g. DFG). However, the systematic work towards such a system requires closing an important gap in the field of geoscientific data science: the lack of a consistent and sufficiently large training data set for the development comparison of different machine learning algorithms.