Simplify the generation and management of logs in your data pipelines.
PipeLogger is a library designed to standardize the creation of logs in data pipelines, providing a consistent format that facilitates problem identification and troubleshooting. With PipeLogger, you can manage detailed and structured logs, enabling more effective tracking of operations and deeper analysis of data ingestion processes.
- Log standardization: PipeLogger creates detailed logs that follow a consistent format, making them easy to read and analyze.
- Integration with Google Cloud Platform (GCP): Designed for pipelines deployed on GCP, supporting Cloud Functions and Cloud Run.
- BigQuery Table Monitoring: Logs and monitors the size of BigQuery tables over time.
- Storage in Google Cloud Storage: Automatically stores logs in a GCP bucket for centralized access and management.
PipeLogger creates logs in a clear and structured JSON format as follows:
{
"PipelineLogs": {
"PipelineID": "Pipeline-Example",
"Timestamp": "MM-DD-YY-THH:MM:SS",
"Status": "Success",
"Message": "Data uploaded successfully",
"ExecutionTime": 20.5075738430023
},
"BigQueryLogs": [
{
"BigQueryID": "project.pipeline-example.table_1",
"Size": 1555
},
{
"BigQueryID": "project.pipeline-example.table_2",
"Size": 3596
}
],
"Details": [
{
"additional_info": [
"Data downloaded successfully",
"Data processed successfully",
"Data uploaded successfully"
]
}
]
}
Before implementing PipeLogger, make sure you meet the following requirements:
- The pipeline must be deployed on Google Cloud Platform (GCP), using Cloud Functions or Cloud Run.
- The pipeline must interact with BigQuery tables.
- A bucket on Google Cloud Storage is required to store the generated logs.
Follow the steps detailed in our Official Documentation to integrate PipeLogger into your pipeline projects.
from pipelogger import logsformatter
import time
# Initialize the log formatter
logger = logsformatter(
pipeline_id="Pipeline-Example",
table_ids=["project.pipeline-example.table_1", "project.pipeline-example.table_2"],
project_id="your-gcp-project-id",
bucket_name="your-gcs-bucket",
folder_bucket="logs_folder"
)
# Simulate pipeline execution
start_time = time.time()
# Simulation of pipeline operations....
# Generate and upload logs
logger.generate_the_logs(
execution_status="Success",
msg="Data uploaded successfully",
start_timer=start_time,
logs_details=["Process completed without errors."]
)
You can easily install PipeLogger from PyPI using pip:
pip install pipelogger
For complete details on implementation, advanced configuration and more usage examples, visit the Official Documentation.
Contributions are welcome! If you have ideas, improvements or have found a bug, please open an issue or submit a pull request in our GitHub repository.
This project is licensed under the terms of the MIT License.
If you have any questions, feel free to contact us through our GitHub page or send us an email.