Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost
Author: frbione
Bione et al. (2024) scientific paper repository.
Reference: BIONE F.R.A. et al. Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost. Marine and Petroleum Geology, v. 162, 106765, 2024. https://doi.org/10.1016/j.marpetgeo.2024.106765
In terminal, run:
python -m venv venv
→
cd venv\Scripts
→
activate.bat
→
cd ..\..
→
pip install -r requirements.txt
To create the train-test dataframe from the originally-compiled full dataframe:
- Open the Data_preparation
notebook, and run all four steps to generate the
train-test dataframe
To reproduce the results using the already tuned model parameters and visualize the results:
- Go to the Reproduce_and_generate_figs notebook, and follow the instructions provided in the notebook.
If you want to run your own models using this approach, bear in mind you must provide a compatible dataframe. Thus, it is very likely that some code adaptations will be required, such as renaming features/targets, data imputation parameters or any other feature engineering technique you wish to include.
Optional. Do this in case you want to run parameter tuning for your own models.
-
Download JDK from this link, and install it;
-
Download Spark from this link, then extract the tar file to a directory (e.g., C:\spark);
-
Download hadoop from this link, then add the
winutils
file to a directory (e.g., C:\hadoop\bin); -
Configure the
Environment Variables
by adding the following:JAVA_HOME
- (e.g., C:\java\jdk)HADOOP_HOME
- (e.g., C:\hadoop)SPARK_HOME
- (e.g., C:\spark\spark-3.3.2-bin-hadoop2)PYSPARK_HOME
- (e.g., ..\venv\lib\site-packages\pyspark)
-
Finally, add the following to Path:
%JAVA_HOME%\bin
%HADOOP_HOME%\bin
%SPARK_HOME%\bin
After installing pySpark, you can run the model_run.py script, passing your own dataframe.