Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use manual remove for PCI device instead of Bbswitch? #74

Open
bomiyr opened this issue Nov 20, 2021 · 8 comments
Open

Use manual remove for PCI device instead of Bbswitch? #74

bomiyr opened this issue Nov 20, 2021 · 8 comments

Comments

@bomiyr
Copy link

bomiyr commented Nov 20, 2021

Hi, It's me again 😄
In this issue we found out that bbswitch is not working on my laptop. TLDR about my investigation is in this comment.

Finally I was able to disable discrete GPU with just echo 1 | sudo tee /sys/bus/pci/devices/0000\:01\:00.0/remove.

And I'm started thinking, that maybe this is the right way for disabling the device? In this case we don't need to have dependency on bbswitch (which seems to be almost dead, if you look at it's repo). But I don't have the right expertise on the topic to see the whole picture.

So what do you think, is it possible to integrate such solution into suse-prime itself, or there are some hidden caveats in such approach?

@sndirsch
Copy link
Collaborator

sndirsch commented Nov 21, 2021

Thanks for the report. Indeed I could disable the NVIDIA GPU that way. :-)

linux:/home/tux # modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device

linux:/home/tux # dmesg
[ 5172.008516] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 5172.008832] NVRM: No NVIDIA GPU found.
[ 5172.009523] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236

Unfortunately I was not able to reenable it. :-( I've tried

linux:/home/tux # echo 1 | tee /sys/bus/pci/devices/0000:01:00.0/rescan
tee: '/sys/bus/pci/devices/0000:01:00.0/rescan': No such file or directory
1

linux:/home/tux # echo 1 /sys/bus/pci/devices/0000:01:00.0/rescan
1 /sys/bus/pci/devices/0000:01:00.0/rescan

linux:/home/tux # modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device
linux:/home/tux # dmesg
[ 5424.197577] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 5424.197879] NVRM: No NVIDIA GPU found.
[ 5424.198315] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236

EDIT:
I'm using a Intel/nvidia combo. NVIDIA is not yet Turing.

@bomiyr
Copy link
Author

bomiyr commented Nov 21, 2021

I was able to reenable card with
’echo 1 | sudo tee /sys/bus/pci/rescan’

@sndirsch
Copy link
Collaborator

sndirsch commented Nov 21, 2021

OMG. ;-) Indeed now it works for me. :-) Even easier.

disable nVidia GPU
modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia
echo 1 > /sys/bus/pci/devices/0000\:00\:01/remove

reenable nVidia GPU
echo 1 > /sys/bus/pci/rescan

@sndirsch
Copy link
Collaborator

Adding our bbswitch expert. @simopil What do you think? Should we try to get rid of bbswitch? Looks like it would be possible.

@bomiyr
Copy link
Author

bomiyr commented Nov 21, 2021

The only problem I see is that rescan command is not exclusive to GPU and will add any previously removed PCI device. And vice versa, calling rescan by user or by some hardware-info app will silently enable GPU...

@sndirsch
Copy link
Collaborator

Yeah. Valid points. And there might be more reasons, why bbswitch kernel module has been written ...

@simopil
Copy link
Contributor

simopil commented Nov 30, 2021

I tried removing device, it disappeared from system but leds on my laptop shows that nvidia card is still powered on.
This is because powering off gpu is done via ACPI calls and not via unbinding device. You can power it off via acpi_call with correct call for your platform.
bbswitch can find a suitable call for your system automatically and perform it without acpi_call module.
BTW bbswitch module is mantained despite main repo is very old, one year ago it was broken due kernel libraries changes and it was fixed in openSUSE.
Edit: I read your #73 and I think you have trouble with acpi-call handling, I've found this and seems exactly your issue, resolved with acpi-handle-hack module (you have to edit code like in the comment with your settings and compile yourself). Module can be found in bbswitch repo

@sndirsch
Copy link
Collaborator

sndirsch commented Dec 1, 2021

Thanks for your input @simopil !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants