Round 1: Completed Round 2: Completed Community Contribution Round: Completed Round 4: Completed
34.3k
852
49
1618

🚀 Challenge Starter Kit.

## 🕵️ Introduction

Facies Identification Challenge: 3D image interpretation by machine learning techniques

The goal of the Seismic Facies Identification challenge is to create a machine-learning algorithm which, working from the raw 3D image, can reproduce an expert pixel-by-pixel facies identification.

What are these 3D images?

These are seismic images of underground structures made during exploration and development of underground reservoirs. We send sound waves into the ground and record echoes returning to the surface. The echoes are then processed into three-dimensional (3D) images highlighting interfaces between rocks with different properties—including different fluid contents in the rocks’ pore spaces.

Seismic images resemble medical ultrasound images of the human body, such as echocardiograms and prenatal ultrasound images, but on a much larger spatial scale. A typical 3D seismic image will cover 25 km x25 km in horizontal extent and 10 km in depth; each image point (voxel or pixel) represents a region about 25m X 25m in horizontal extent and about 10 m in depth. The number of pixels in a typical image is therefore about 1 billion (1000 × 1000 × 1000); many are much larger!

What are these facies?

The interfaces delineated by echoes in seismic images often mark boundaries between regions containing sediments deposited by different types of geological processes. These processes operating in different geographic settings create what are called distinct geologic “facies”—a technical term for a region inside the Earth with rocks of similar composition deposited in a common environment. This body of rock or 'facies' can have any observable attribute of rocks such as their overall appearance, composition, or condition of formation, and the changes that may occur in those attributes over a geographic area. So, Facies is the sum total characteristics of a rock including its chemical, physical, and biological features that distinguishes it from adjacent rock.

Examples of different “sedimentary” environments and facies are

• A mix of sand, silt, and mud deposited in a fan-shaped delta at the mouth of a river (deltaic environment and facies)
• Coarse sandy sediments deposited in a meandering river channel (fluvial environment and facies)
• Extremely fine-grained sediments deposited in a shallow lakebed (lacustrine environment and facies).

For this challenge, human-interpreters have identified 6 such facies to label. Your algorithm should label each pixel (voxel) in a 3D siesmic image of underground geological structures according to these 6 facies.

A task like this is generally done by a team of geologists working in collaboration with specialists who design the surveys and process the raw data to crea te the images. Manual interpretation then is done on workstations equipped to rapidly display and highlight (with standard image filters and displays; see the below figure) different features of the 3D image. Full classification of an image of this size often requires hundreds of work-hours by a team of geologists.

Understand with code! Here is getting started code for you.😄

Wiggle plots of image values (red) as a function of depth (increasing vertically down the page), superimposed on the facies interpretation in a small section of the image. Displays such as this illustrate how geologists look for features in the vertical sequence of pixel values to identify key interfaces between different facies. Notice, for example, that a burst of high echo-amplitudes rapidly alternating between positive and negative values is characteristic of the transition between "blue" and “green” facies across the middle part of the image.

## 💾 Dataset

In the example to be used for the competition, a 3D seismic image from a public-domain seismic survey called “Parihaka,” available from the New Zealand government, has been interpreted by expert geologists, with each pixel in the image classified into one of 6 different facies based on patterns seen in the image.

The below figures(FIG 1.1 & 1.2) show a rendering of two vertical slices and one horizontal slice through the 3D seismic image to be used in this challenge. The image is plotted in a standard Cartesian coordinate system with X and Y measuring horizontal positions near the Earth’s surface and Z measuring depth in the Earth. (At the scale of the image, the curvature of the Earth is not significant; its surface can be taken as flat.) The image is plotted as gray scale with the intensity at each point representing (roughly) the strength of the sound-wave echo reflected back to the surface from the corresponding point in the Earth. In a seismic survey like the one that produced this image, many thousands of echoes are averaged (processed) to obtain the image value at each point.

FIG 1 | 3D views of XZ, YZ, and XY slices through the training data image (TOP) and corresponding labels (BOTTOM). Image is shown in grayscale with a saturation that highlights interfaces.

The below figures(FIG 2.1 & 2.2) show a close-up view of a vertical slice through the image—in the YZ-plane according to the coordinate system shown in Figure 1—and the corresponding labels, which have been color coded.

FIG 2.1 & 2.2 | (TOP) Close-up view of vertical YZ slice through the training dataset (at X index 75) showing the seismic image in grayscale. (BOTTOM) Pixel-by-pixel classification of the image into 6 different facies, made by an expert geologist recognizing patterns in the data.

The data for the challenge is divided into 3pieces, with each piece (dataset) representing a 3D image of a different region of the subsurface in the area surveyed. Each dataset represents the image as a 3D array (matrix) of single-precision real numbers—e.g., as IMAGE(i,j,k), where index i runs from 1 to NZ (1:NZ), the number of image samples in the depth (Z) direction, while j and k run from 1:NX and 1:NY, the number of samples in the X and Y directions, respectively. The value stored in the array location IMAGE(i,j,k) is a real number (positive or negative), representing the strength of the echo returned to the surface from the spatial location in the Earth represented by the indices (i,j,k).

The training dataset (TRAIN) is a 3D image represented as an array of 1006 × 782 × 590 real numbers, stored in the order (Z,X,Y). Labels corresponding to the training data are similarly arranged in a 1006 × 782 × 590 array of integers with values from 1 to 6: each integer label corresponds to a classification (by an expert geologist) of each image pixel in TRAIN into one of six different facies (see Figures 1 and 2).

The following are the geologic descriptions of each labels:

• 1 : Basement/Other: Basement - Low S/N; Few internal Reflections; May contain volcanics in places
• 2 : Slope Mudstone A: Slope to Basin Floor Mudstones; High Amplitude Upper and Lower Boundaries; Low Amplitude Continuous/Semi-Continuous Internal Reflectors
• 3 : Mass Transport Deposit: Mix of Chaotic Facies and Low Amplitude Parallel Reflections
• 4 : Slope Mudstone B: Slope to Basin Floor Mudstones and Sandstones; High Amplitude Parallel Reflectors; Low Continuity Scour Surfaces
• 5 : Slope Valley: High Amplitude Incised Channels/Valleys; Relatively low relief
• 6 : Submarine Canyon System: Erosional Base is U shaped with high local relief. Internal fill is low amplitude mix of parallel inclined surfaces and chaotic disrupted reflectors. Mostly deformed slope mudstone filled with isolated sinuous sand-filled channels near the basal surface.

The test datasets represent 3D images to be classified by a machine-learning algorithm (after supervised training using the training data set and labels).

The test dataset for Round-1 is 1006 × 782 × 251 in size and borders the training image at North (FIG 3). The test dataset for Round-2 is 1006 × 334 × 841 in size and borders the training image at East (FIG 3).

Auxiliary data provided with the images show precisely how the images fit together. Classifications of the test datasets have also been done by an expert and will serve as ground truth for scoring submissions.

FIG 3 | View looking down on the XY-plane showing how the training and test datasets fit together spatially. The three datasets are actually extractions a much larger 3D seismic image of the survey region, which is in the Parihaka block of the Taranaka Basin off the northwest coast of New Zealand. The absolute (X,Y) indices in this plot come from the indexing used as the local coordinate system full seismic image.

## 📁 Files

Following files are available in the Resources section( All the files are in binary format ):

• data_train.npz - The file containing the 3D image represented as an numpy array of 1006 × 782 × 590 real numbers, stored in the order (Z,X,Y).
• labels_train.npz : The file containing the labels corresponding to the training data, similarly arranged in a 1006 × 782 × 590 numpy array of integers with values from 1 to 6: each integer label corresponds to a classification.
• data_test_1.npz - This file contains the test set for Round-1, which is a 3D images represented as an array of 1006 × 782 × 251 in size and borders the training image at the north .
• sample_submission_1.npz - Sample Submission Format for Round-1 (should be of the same shape as data_test_1.npz)
• data_test_2.npz - This file contains the test set for Round-2, which is a 3D images represented as an array of 1006 × 334 × 841 in size and borders the training image at the north .
• sample_submission_1.npz - Sample Submission Format for Round-2 (should be of the same shape as data_test_2.npz)

All the files are present in npz format, with key as data and labels for data and labels respectively.

## 🚀 Submission

• Prepare a numpy array containing the labels [1-6] for each pixel, for test dataset(i.e. of size 1006 × 782 × 251 for Round-1 and 1006 × 334 × 841 for Round-2) and save it in npz i.e. compressed format, with the key as prediction.

• Sample submission format can accessed from resources section.

• Number of submission allowed in a day is 5.

Make your first submission here 🚀 !!

## 🖊 Evaluation Criteria

F1 score and the Accuracy will be used as the first-order metric for measuring the correctness of labels.

When the round is going on, the scores you see on the leaderboard will be computed using only 60% of the test data. And after the round is over, the leaderboard will be updated and your final scores will be computed using whole of the test data.

Please note, in Round 2, your F1-Score as well as  Accuracy will be computed in a weighted way where class 5 and class 6 will have 20x more weight than the rest of the classes.

> Check out compute_score.py for the code that is used to compute the scores on the leaderboard. If you would like to suggest any optimizations in the code or if you want to report any bug, please consider sending across a pull request.

## 📅 Rounds

The competiton consists of 2 separate Rounds.

• Round-1 : September 14th, 2020 - October 27th, 2020
• Round-2 : November 18th, 2020 - December 15th, 2020
• Round-3: January 25th, 2021 - March 8th, 2021
• Round-4: September 20th, 2021 - November 20th, 2021

## 🏆 Prizes

Prizes will be awarded as follows:

Round 1

Leaderboard Prizes (based on final score at the end of round 1)

Community Contribution Prizes

Round 2

Leaderboard Prizes (based on final score at the end of Round 2)

• #1 – USD 13,000
• #2 – USD 7,000
• #3 – USD 5,000

Round 3: Community Contribution Round

8 Oculus Quest 2’s for Top 8 Contributions for the challenge!

Round 4

Note:

• Previous winners who have already received a gadget as the prize for the leaderboard or the community contribution are not eligible for winning another award.
• These leaderboard winners are eligible for the prize if they have a minimum weighted F1 score of 0.857 on the leaderboard.