Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost

Bione et al. (2024) scientific paper repository.

Reference: BIONE F.R.A. et al. Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost. Marine and Petroleum Geology, v. 162, 106765, 2024. https://doi.org/10.1016/j.marpetgeo.2024.106765

⚙️Configuring the environment

Creating the venv and installing the dependencies

In terminal, run:

python -m venv venv →
cd venv\Scripts →
activate.bat →
cd ..\.. →
pip install -r requirements.txt

📖 Create the train-test dataframe

To create the train-test dataframe from the originally-compiled full dataframe:

Open the Data_preparation notebook, and run all four steps to generate the train-test dataframe

📈 Reproducing and visualizing the results

To reproduce the results using the already tuned model parameters and visualize the results:

Go to the Reproduce_and_generate_figs notebook, and follow the instructions provided in the notebook.

▶️ Tune your own models

If you want to run your own models using this approach, bear in mind you must provide a compatible dataframe. Thus, it is very likely that some code adaptations will be required, such as renaming features/targets, data imputation parameters or any other feature engineering technique you wish to include.

Installing pySpark (Windows)

Optional. Do this in case you want to run parameter tuning for your own models.

Download JDK from this link, and install it;
Download Spark from this link, then extract the tar file to a directory (e.g., C:\spark);
Download hadoop from this link, then add the winutils file to a directory (e.g., C:\hadoop\bin);
Configure the Environment Variables by adding the following:
- JAVA_HOME - (e.g., C:\java\jdk)
- HADOOP_HOME - (e.g., C:\hadoop)
- SPARK_HOME - (e.g., C:\spark\spark-3.3.2-bin-hadoop2)
- PYSPARK_HOME - (e.g., ..\venv\lib\site-packages\pyspark)
Finally, add the following to Path:
- %JAVA_HOME%\bin
- %HADOOP_HOME%\bin
- %SPARK_HOME%\bin

After installing pySpark, you can run the model_run.py script, passing your own dataframe.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
figures		figures
modules		modules
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost

⚙️Configuring the environment

Creating the venv and installing the dependencies

📖 Create the train-test dataframe

📈 Reproducing and visualizing the results

▶️ Tune your own models

Installing pySpark (Windows)

About

Releases

Packages

Languages

License

frbione/bioneetal2024

Folders and files

Latest commit

History

Repository files navigation

Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost

⚙️Configuring the environment

Creating the venv and installing the dependencies

📖 Create the train-test dataframe

📈 Reproducing and visualizing the results

▶️ Tune your own models

Installing pySpark (Windows)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages