-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to only process log entries that haven't been processed before #232
base: 3.x-dev
Are you sure you want to change the base?
Conversation
Works fine here. |
This is definitely a useful enhancement. Would certainly love to see that in the Matomo Log Importer 👍 . |
I resolved merge conflicts with 4.x-dev (commit 6f66f96) here: https://github.com/strager/matomo-log-analytics/tree/timestamp |
ping. I've been using my version of this patch for a while and I've been happy with it. |
Matomo does not automatically import Apache logs. Importing needs to be done manually. Write a systemd service which runs the log importer (using our fork [1] to incrementally import logs [2]), and a systemd timer to run the service daily. [1] https://github.com/strager/matomo-log-analytics/tree/timestamp [2] matomo-org/matomo-log-analytics#232
Matomo does not automatically import Apache logs. Importing needs to be done manually. Write a systemd service which runs the log importer (using our fork [1] to incrementally import logs [2]), and a systemd timer to run the service daily. [1] https://github.com/strager/matomo-log-analytics/tree/timestamp [2] matomo-org/matomo-log-analytics#232
Help wanted! |
I want to use Matomo with log analytics only. My Nginx logs are rotated every week, but I want my reports to be updated much earlier, e.g. every hour. If I just feed the same log file with already reported visits to the importer, I will have duplicated entries, so I need to either rotate logs every hour (very inconvenient) or somehow prevent logs from being imported twice. Based on what I could find, there is currently no easy way to do this.
This pull request solves this by tracking the latest visit timestamp found in an imported log file and then saving it to a file specified in a
--timestamp-file
option. On the next run this timestamp is loaded at startup and all visits before or on this timestamp are ignored (like--exclude-older-than
, but inclusive, since the log with equal timestamp was already parsed).This kind of solves #144.
I've put
initial_timestamp
(loaded from the file at the beginning) in the config andlatest_timestamp
(updated after every log record) in the stats. This can be moved elsewhere if it's not the best place.I've also added some lines to the summary to print the status of the timestamp-based filtering, and included the older/newer than filtering too since it's related:
I also tweaked the printing there to remove extra empty lines (more than 2 newlines are compacted into 2) - this was already a problem before, as the space between the 2nd and 3rd section was bigger than between 1/2 and 3/4 because of
%(sites_ignored)s
, but was made more visible with the date filtering section added.