Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if hostname has no http in it importer together with docker-matomo fails to file ip- adresses clients correctly #354

Open
hanscees opened this issue May 28, 2023 · 8 comments

Comments

@hanscees
Copy link

Hi,
I am trying to read in logfiles, but in the dashboard all visitors seem to be ip address 0.0.0.0 and I assume that is why the worldmap does not show visitors per country of city.

I am reading in loglines like these:

154.12.103.62 www.bomengids.nl - - [22/May/2023:00:00:18 +0200] "GET /knop.html HTTP/1.1" 200 40139 "-" "newspaper/0.2.8" 51.159.154.15 www.bomengids.nl - - [22/May/2023:00:00:18 +0200] "GET /winter/Hollandse_iep__Ulmus_hollandica__Dutch_Elm@1@img_91 98knop_th.jpg HTTP/1.1" 200 9067 "https://www.bomengids.nl/knop.html" "newspaper/0.2.8" 104.227.93.210 www.bomengids.nl - - [22/May/2023:00:00:20 +0200] "GET /winter/Hollandse_iep__Ulmus_hollandica__Dutch_Elm@1@img_9 198knop_th.jpg HTTP/1.1" 200 9067 "-" "python-requests/2.28.2"

I am using this regexps to read them in:

--log-format-regex='((?P<ip>\S+) (?P<host>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "GET (?P<path>.*?) HTTP/\S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'

python3 ../import_logs1.py --url http://192.168.0.61:8080 --login [email protected] --password seclet --idsite=3 --enable-static --log-format-regex='((?P<ip>\S+) (?P<host>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "GET (?P<path>.*?) HTTP/\S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*' testlog

This seems to work as a tcpdump shows this:

`

98713}, {"rec": "1", "apiv": "1", "url": "https://www.bomengids.nl/zomer2004/pics/Trompetboom__Catalpa_bignonioides__Southern_catalpaimg_4697blad.jpg", "urlref": "https://arnfoto.ru/", "cip": "89.113.127.50", "cdt": "2023-05-15 21:58:33", "idsite": "3", "queuedtracking": "0", "dp": "1", "ua": "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Mobile Safari/537.36", "cvar": "{"1": ["HTTP-code", "200"]}", "download": "https://www.bomengids.nl/zomer2004/pics/Trompetboom__Catalpa_bignonioides__Southern_catalpaimg_4697blad.jpg", "bw_bytes": 98713}]}
`

so the script seems to tell the matomo server client ip is for instance 89.113.127.50.

Is my assumpion correct that the regexps is correct goven the tcpdump data?

If so, why do visitor logs show every visit to be from 0.0.0.0?

image

@hanscees
Copy link
Author

hanscees commented May 28, 2023

I have more proof the script is flawed, or at least it fails to add client-ip's into the database as one would expect.

A direct query on the docker database shows all client-ip's to be 0.0.0.0

docker exec -it matmoto-db-1 sh

mysql -u matomo -pseclet  

show databases 
use matomo
#https://matomo.org/faq/how-to/faq_158/
#In the database, the IP addresses and Visitor IDs are stored in Binary form for storage efficiency. To display these values correctly you can use the following SQL query:
SELECT INET6_NTOA(`location_ip`) as ip, conv(hex(idvisitor), 16, 16) as visitorId FROM matomo_log_visit;

shows 
| 0.0.0.0    | 84E04E745AB1F733 |
| 0.0.0.0    | 896B3ECA45DA2A3B |
| 0.0.0.0    | 6C6E9086BA0877EF |
| 0.0.0.0    | 64006E878F1C46CC |
| 0.0.0.0    | 1BE07D16EC7E4BE6 |

@sgiehl
Copy link
Member

sgiehl commented May 31, 2023

@hanscees What are your anonymization settings in Matomo. Did you maybe configure it to fully discard the IP address?
Also tracking IP address might require a token_auth to be send with the request. Can't see one in the tcp dump, but maybe it's only hidden...

@hanscees
Copy link
Author

hanscees commented Jun 2, 2023

Hi,
I have disabled all the anonymisation settings I could find. Then destroyed the site, re-created it and then uploaded data again.
Still all ip's are 0.0.0.0

I do not send an token_auth.

Here's my system diagnostic attached

matomo_system_check.txt

@hanscees
Copy link
Author

hanscees commented Jun 2, 2023

Looks like the logimporter somehow fetches a token on its own?

image

@hanscees
Copy link
Author

hanscees commented Jun 9, 2023

I have found the error. Not in the script of course...

In the website url I had

https://www.bomengids.nl
http://bomengids.nl

but I needed to add

bomengids.nl

However, after in this setting it red in a few lines succesfully, showing them on the worldmap, the website changed the url automagically to the url with http:// added.
After which it fails again.

@hanscees
Copy link
Author

hanscees commented Jun 9, 2023

So if the bug is indeed that matomo web-application cannot have a hostname without http(s):// before the url..

I can of course change the script to add http:// to the hostname.
Lets try that

@hanscees hanscees changed the title importer together with docker-matomo fails to file ip- adresses clients correctly if hostname has no http in it importer together with docker-matomo fails to file ip- adresses clients correctly Jun 9, 2023
@hanscees
Copy link
Author

hanscees commented Jun 9, 2023

I changed this bit of code in the importer script and now it parses the loglines correctly, or rather it inserts http:// into the host name.
And the worldmap works now!!

Notice it works with this regeps (HOST SHOULD BE DEFINED)

/usr/bin/python3 /var/lib/docker/volumes/matmoto_matomo/_data/misc/log-analytics/import_logs.py --url=http://192.168.0.61:8080 --debug  --login [email protected] --password "secletvelly2" --idsite=2 --recorders=4 --enable-static --enable-bots --log-format-regex='((?P<ip>\S+) (?P<host>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "GET (?P<path>.*?) HTTP/\S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'  $i

What a bitch of a bug

            if config.options.log_hostname:
                hit.host = config.options.log_hostname
            else:
                try:
                    hit.host = format.get('host').lower().strip('.')

                    if hit.host.startswith('"'):
                        hit.host = hit.host[1:-1]
                except BaseFormatException:
                    # Some formats have no host.
                    pass
            print("hc here, lets see what host or website is according to script code \n")
            print(hit.host)
            # lets test if host haas http in it
            match = re.search(r'http', hit.host)
            if match:
                print("host has http in it, do nothong")
            else:
                print("host variable has no http in it, lets add that of matomo will frie")
                hit.host = "https://" + hit.host
                print("changed host variable to: ", hit.host)

@hanscees
Copy link
Author

hanscees commented Jun 9, 2023

Now the question is, will matomo fix it? Because my code is just proof of concept with typoos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants