Skip to content

This repository is complementary to the Bione et al. (2024) scientific paper published in Marine and Petroleum Geology. It was created for reproducibility, use and modification purposes.

License

Notifications You must be signed in to change notification settings

frbione/bioneetal2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost

Author: frbione

Bione et al. (2024) scientific paper repository.

Reference: BIONE F.R.A. et al. Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost. Marine and Petroleum Geology, v. 162, 106765, 2024. https://doi.org/10.1016/j.marpetgeo.2024.106765


⚙️Configuring the environment

Creating the venv and installing the dependencies

In terminal, run:

python -m venv venv
cd venv\Scripts
activate.bat
cd ..\..
pip install -r requirements.txt


📖 Create the train-test dataframe

To create the train-test dataframe from the originally-compiled full dataframe:

  • Open the Data_preparation notebook, and run all four steps to generate the train-test dataframe

📈 Reproducing and visualizing the results

To reproduce the results using the already tuned model parameters and visualize the results:



▶️ Tune your own models

If you want to run your own models using this approach, bear in mind you must provide a compatible dataframe. Thus, it is very likely that some code adaptations will be required, such as renaming features/targets, data imputation parameters or any other feature engineering technique you wish to include.

Installing pySpark (Windows)

Optional. Do this in case you want to run parameter tuning for your own models.

  • Download JDK from this link, and install it;

  • Download Spark from this link, then extract the tar file to a directory (e.g., C:\spark);

  • Download hadoop from this link, then add the winutils file to a directory (e.g., C:\hadoop\bin);

  • Configure the Environment Variables by adding the following:

    • JAVA_HOME - (e.g., C:\java\jdk)
    • HADOOP_HOME - (e.g., C:\hadoop)
    • SPARK_HOME - (e.g., C:\spark\spark-3.3.2-bin-hadoop2)
    • PYSPARK_HOME - (e.g., ..\venv\lib\site-packages\pyspark)
  • Finally, add the following to Path:

    • %JAVA_HOME%\bin
    • %HADOOP_HOME%\bin
    • %SPARK_HOME%\bin

After installing pySpark, you can run the model_run.py script, passing your own dataframe.

About

This repository is complementary to the Bione et al. (2024) scientific paper published in Marine and Petroleum Geology. It was created for reproducibility, use and modification purposes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published