Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scanning of files compressed directly via xz (as opposed to tar -J) #650

Merged
merged 5 commits into from
Nov 19, 2024

Conversation

egibs
Copy link
Member

@egibs egibs commented Nov 19, 2024

I've been wondering why scanning .xz files does not work and it turns out that there's some nuance with how these files are scanned compared to .tar.xz files.

Files compressed with xz can be copied into a file directly via io.Copy and using the tar.Reader will not work; .tar.xz files (created via tar -J) will still need to be passed to the tar.Reader after the xz stream is created.

Example:

$ go run cmd/mal/mal.go analyze ./out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz
🔎 Scanning "./out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz"
├─ 🟡 out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz ∴ /buildah [MEDIUM]
│     ≡ collection [MEDIUM]
│       🟡 archives/zip — Works with zip files: archive/zip
│       🔵 databases/sql — accesses SQL databases
│       🟡 databases/sqlite — accesses SQLite databases: sqlite3
│     ≡ command & control [MEDIUM]
│       🟡 addr/http_dynamic — URL that is dynamically generated: https://%s/archive/master.tar.gzunsupported
│       🟡 addr/ip — mentions an IP and port:
│             bind_port, client_ip, enable_port, end_ip, guest_port, hasPort, host_ip, host_port, internal_ip, ipPort…
│       🟡 discovery/ip_dns_resolver — contains Google Public DNS resolver IP: 8.8.4.4, 8.8.8.8
│     ≡ credential [MEDIUM]
│       🟡 keychain — accesses a keychain
│       🔵 password — references a 'password':
│             --disabled-password --group, EnterPasswordMode, ErrPasswordTooLong, ExitPasswordMode, GenPasswordConfig…
│       🟡 sniffer/bpf — BPF (Berkeley Packet Filter): bpf
│       🔵 ssl/private_key — References private keys: privateKey, private_key, privatekey
│     ≡ cryptography [MEDIUM]
│       🔵 aes — Supports AES (Advanced Encryption Standard): crypto/aes
│       🟡 cipher — mentions 'ciphertext'
│       🔵 ecdsa — Uses the Go crypto/ecdsa library
│       🔵 ed25519 — Elliptic curve algorithm used by TLS and SSH: ed25519
│       🔵 tls — tls: TLS13, TLSVersion, crypto/tls
│     ≡ data [MEDIUM]
│       🔵 compression/bzip2 — Works with bzip2 files
│       🔵 compression/gzip — works with gzip files
│       🔵 compression/lzma — works with lzma files
│       🟡 compression/xz — uses xz library: ulikunitz/xz
│       🔵 compression/zstd — Zstandard: fast real-time compression algorithm: (�/�, zstd
│       🟡 embedded/html — Contains HTML content: <html>
│       🟡 embedded/zstd — Contains compressed content in ZStandard format: (�/�
│       🔵 encoding/base64 — Supports base64 encoded strings
│       🔵 encoding/json — Supports JSON encoded objects: encoding/json
│       🔵 encoding/json_decode — Decodes JSON messages: json.Unmarshal
│       🔵 hash/blake2b — Uses blake2b encryption algorithm
│       🔵 hash/md5 — Uses the MD5 signature format: md5:
│     ≡ discovery [MEDIUM]
│       🟡 network/mac_address — Retrieves network MAC address: macAddress
│       🟡 process/name — get the current process name: processName
│       🔵 system/cpu — gets number of processors: nproc
│       🔵 system/hostname — get computer host name: /proc/sys/kernel/hostname
│       🔵 system/platform — system identification: syscall.Uname
│       🟡 system/sysinfo — get system information (load, swap): sysinfo
│       🔵 user/HOME — Looks up the HOME directory for the current user: getenv
│       🔵 user/USER — Looks up the USER name of the current user: getenv
│     ≡ evasion [MEDIUM]
│       🟡 bypass_security/linux/iptables — interacts with the iptables firewall
│       🟡 file/location/dev_mqueue — path reference within /dev/mqueue (world writeable):
│             /dev/mqueuescan%d.jsonsbom-resultpurl-resultlayer-labelcpuset-cpuscpuset-me
│       🟡 file/location/dev_shm — references path within /dev/shm (world writeable):
│             /dev/shm/.rootfs, /dev/shm/aufs.xinofailed, /dev/shm/aufs.xinokernel
│       🟡 file/location/var_run — references subfolder within /var/run: /var/run/dbus/
│       🟡 file/prefix — hidden path in a system directory:
│             /dev/ptsrelatime/dev/shm/.rootfs, /run/.containerenvskip-, /task/.dockercfgReadBigI
│       🔵 file/prefix/dev — hidden path reference within /dev/shm (world writeable): /dev/shm/.rootfs
│     ≡ execution [MEDIUM]
│       🟡 cmd — executes a command: execCommand, pruneCmd, runCmd
│       🟡 dylib/symbol_address — get the address of a symbol: dlsym
│       🔵 plugin — references a 'plugin':
│             CNIPluginPath, DefaultCNIPluginDirs, DefaultNetavarkPluginDirs, ErrIntOverflowPlugin, ErrInvalidLengthP…
│       🟡 program — executes external programs: fexecve
│       🔵 reconfigure/hostname_set — sethostname
│       🔵 shell/SHELL — path to active shell: SHELL
│       🔵 shell/TERM — Look up or override terminal settings: TERM
│       🟡 shell/background_sleep — calls sleep and runs shell code in the background: 2>&1 &, _sleep
│       🟡 shell/exec — executes shell:
│             /bin/bash, /bin/sh -c MarshalJSONMarshalTextCAP_SETFCAPCAP_SETPCAPgithub, /bin/sh -c context, /bin/sh -…
│       🟡 system_controls/apparmor — Mentions 'apparmor'
│     ≡ filesystem [MEDIUM]
│       🔵 directory/create — creates directories: mkdir
│       🔵 directory/list — Uses Go functions to list a directory: .ReadDir
│       🔵 directory/remove — Uses libc functions to remove directories: Rmdir, rmdir
│       🔵 event_monitoring — filesystem event monitoring: fanotify_init
│       🔵 fifo_create — make a FIFO special file (a named pipe): mkfifo
│       🟡 file/create — create a new file: CreateFilesystem
│       🔵 file/delete — deletes files: unlinkat
│       🔵 file/delete_forcibly — Forcibly deletes files: rm --alltls
│       🔵 file/read — reads files: ReadFile, os.(*File).Read
│       🔵 file/rename — renames files: os.rename
│       🟡 file/times_set — change file last access and modification times: utimes
│       🔵 file/truncate — truncate a file to a specified length: ftruncate64
│       🔵 file/write — writes to file: WriteFile, writeCacheFileToWriter, writeHostFile, writeRawFile
│       🔵 link_create — May create hard file links: linkat
│       🔵 link_read — read value of a symbolic link: readlinkat
│       🔵 lock_update — apply or remove an advisory lock on a file: flock
│       🟡 loopback — access virtual block devices (loopback): /dev/loop%dcompressionlog
│       🔵 mount — mounts file systems: remount
│       🔵 node_create — create device files: mknod
│       🔵 path/etc — path reference within /etc:
│             /etc/apache/mime.typesfile, /etc/bash, /etc/binfmt.d/run/binfmt.d, /etc/cdicgroupfsconmonrsk, /etc/cont…
│       🟡 path/etc_hosts — references /etc/hosts
│       🔵 path/etc_resolv.conf — accesses DNS resolver configuration: /etc/resolv.conf
│       🔵 path/home_config — path reference within ~/.config:
│             .config/containers/policy, class.config/containers/registries, conf.config/containers/registries, found…
│       🟡 path/lib_dynamic — References a library file that can be generated dynamically: /lib/%s
│       🟡 path/relative — references and possibly executes relative path: ./pipe, ./somefile
│       🟡 path/tmp — path reference within /tmp: /tmp/containers.X, /tmp/fedora, /tmp/sourceimage
│       🟡 path/users — references path within /Users: /Users/martin
│       🔵 path/usr_bin — path reference within /usr/bin:
│             /usr/bin/conmonCONTAINERS_CONF, /usr/bin/conmonrscopying, /usr/bin/crun-vm/usr/bin/kata-fc/usr/sbin/con…
│       🟡 path/usr_local — path reference within /usr/local/bin:
│             /usr/local/bin/conmonrsMerged, /usr/local/bin/conmonset, /usr/local/bin/crun-vm/usr/sbin/kata-runtime/u…
│       🔵 path/usr_sbin — path reference within /usr/sbin:
│             /usr/sbin/conmonrscopying, /usr/sbin/conmonwriting, /usr/sbin/crun-wasm/usr/local/bin/runc/usr/local/bi…
│       🔵 path/var — path reference within /var:
│             /var/cacheSecureJoinfscontext, /var/lib/cni/usr/bin/rpmIN_MOVE_SELFprefix, /var/lib/containers/cacheblo…
│       🟡 permission/chown — Changes file ownership: Chown
│       🟡 permission/modify — modifies file permissions: Chmod, chmod
│       🟡 proc/arbitrary_pid — access /proc for arbitrary pids:
│             /proc/%d/fd/, /proc/%d/ns/netkern, /proc/%d/ns/userinv, /proc/%d/statusdac_, /proc/%d/task/, /proc/%s/g…
│       🟡 proc/self_cgroup — accesses /proc files within own cgroup: /proc/self/cgroupcpuacct
│       🟡 proc/self_cmdline — gets process command-line: /proc/self/cmdline
│       🟡 proc/self_exe — gets executable associated to this process: /proc/self/exe
│       🟡 proc/self_mountinfo — gets mount info associated to this process: /proc/self/mountinfo
│       🔵 tempdir — looks up location of temp directory: TMPDIR
│       🔵 tempdir/TEMP — temp: TEMP, getenv
│       🔵 tempdir/TMPDIR — TMPDIR: getenv
│       🔵 tempdir/create — creates temporary directory: temp dir
│       🔵 unmount — unmount file system: umount
│       🔵 watch — monitors filesystem events: inotify
│     ≡ hardware [MEDIUM]
│       🟡 dev/block_ice — works with block devices: /dev/block/%d, /sys/dev/blockdocker
│     ≡ impact [MEDIUM]
│       🟡 degrade/linux_paths — accesses multiple critical Linux paths:
│             /boot/home/root/sbin.jsontrivyustatala, /dev/shm, /etc/selinux/configlsetxattr, /etc/selinux/not, /lib6…
│       🟡 remote_access/iptables — uploads, uses iptables and HTTP:
│             uploadBlob, uploadData, uploadManifest, uploadedAlgorithm, uploadedAnnotations, uploadedCompressorBase,…
│     ≡ mem [MEDIUM]
│       🟡 anonymous_file — create an anonymous file: memfd_create
│     ≡ networking [MEDIUM]
│       🔵 dns — Uses DNS (Domain Name Service): CNAMEResource, SetEDNS0, dnsmessage
│       🟡 dns/reverse — looks up the reverse hostname for an IP: .in-addr.arpa, ip6.arpa
│       🔵 dns/servers — Examines local DNS servers: CNAMEResource, resolv.conf
│       🔵 dns/txt — Uses DNS TXT (text) records: dns
│       🟡 download — download files:
│             DownloadForeignLayers, DownloadedCache, MaxParallelDownloads, downloadToDirectory, downloadlayer, maxPa…
│       🔵 http/2 — Uses the HTTP/2 protocol
│       🔵 http/accept_encoding — set HTTP response encoding format (example: gzip): Accept-Encoding
│       🔵 http/auth — makes HTTP requests with basic authentication: WWW-Authenticate, Www-Authenticate, www-authenticate
│       🟡 http/content_length — Sets HTTP content length to zero: Content-Length: 0
│       🟡 http/cookies — access HTTP resources using cookies: Cookie
│       🟡 http/form_upload — upload content via HTTP form: POST, application/json, application/x-www-form-urlencoded, post
│       🔵 http/oauth2 — supports OAuth2: oauth2
│       🟡 http/post — submits content to websites: Content-Type, HTTP, POST, http
│       🔵 http/proxy — use HTTP proxy that requires authentication: Proxy-Authorization
│       🔵 http/request — makes HTTP requests: HTTP/1., Referer, User-Agent
│       🟡 ip/host_port — connects to an arbitrary hostname:port:
│             host and port, host to transport, host, port, host.port, host/list transport, host:port, host[:port, ho…
│       🟡 ip/icmp — Uses the ping tool to generate ICMP packets: ping not acked within timeout
│       🟡 ip/parse — parses IP address (IPv4 or IPv6): IsLinkLocalUnicast, IsSingleIP
│       🔵 resolve/hostname — resolve network host name to IP address: cannot resolve
│       🔵 resolve/hostport_parse — Network address and service translation: freeaddrinfo, getaddrinfo
│       🟡 socket/listen — listen on a socket: accept
│       🔵 socket/local_addr — get local address of connected socket: getsockname
│       🔵 socket/peer_address — get peer address of connected socket: getpeername
│       🔵 socket/receive — receive a message from a socket: recvfrom, recvmsg
│       🔵 socket/send — send a message to a socket: sendmsg, sendto
│       🟡 tcp/connect — connects to a TCP port: dialTCP
│       🔵 tcp/grpc — Uses the gRPC Remote Procedure Call framework
│       🟡 tcp/ssh — Uses crypto/ssh to connect to the SSH (secure shell) service
│       🔵 udp/receive — Listens for UDP responses: ReadFromUDP, listenUDP
│       🔵 udp/send — Sends UDP packets: DialUDP, WriteMsgUDP
│       🔵 url/embedded — contains embedded HTTPS URLs:
│             https://auth.docker.com/, https://downloadlayer, https://github.com/golang/protobuf/issues/1609, https:…
│       🟡 url/encode — encodes URL, likely to pass GET variables: urlencode
│       🔵 url/parse — Handles URL strings: RequestURI
│       🟡 url/request — requests resources via URL: http.request, net/url
│     ≡ operating-system [LOW]
│       🔵 fd/sendfile — transfer data between file descriptors: sendfile, syscall.Sendfile
│       🔵 kernel/kcore — access physical memory of the system in core file format: /proc/kcore
│       🔵 kernel/key_management — kernel key management facility: keyctl
│       🔵 kernel/netlink — communicate with kernel services: netlink
│       🔵 kernel/seccomp — operate on Secure Computing state of the process: seccomp
│     ≡ persistence [MEDIUM]
│       🟡 pid_file — pid file, likely DIY daemon: readPidFile
│     ≡ process [LOW]
│       🔵 chroot — change the location of root for the process: chroot
│       🔵 groupid_set — set real, effective, and saved group ID of process: setgid
│       🔵 groups_set — set group access list: setgroups
│       🔵 multithreaded — creates pthreads: pthread_create
│       🔵 unshare — disassociate parts of the process execution context: unshare
│       🔵 userid_set — set real and effective user ID of current process: setuid
│     ≡ suspicious text [MEDIUM]
│       🟡 exclamation — gets very excited: does not work!!!, ontain alphanumerical characters onlyexplicitly tagged !!
│       🟡 intercept — References interception: interceptIO
│

Original behavior:

$ go run cmd/mal/mal.go analyze ./out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz
🔎 Scanning "./out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz"
time=2024-11-19T14:50:54.882-06:00 level=ERROR source=.../repos/chainguard-dev/malcontent/pkg/action/scan.go:325 msg="unable to process out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz: extract to temp: failed to extract out/chainguard-dev/malcontent-samples/linux/clean/buildah.xz: failed to read tar header: archive/tar: invalid tar header"

@egibs egibs enabled auto-merge (squash) November 19, 2024 21:13
@egibs egibs merged commit b4ea545 into chainguard-dev:main Nov 19, 2024
8 checks passed
@egibs egibs deleted the fix-xz-scans branch November 20, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants