Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoundWire: fixes before changes to bind/unbind #3645

Conversation

plbossart
Copy link
Member

The patches in this PR correct obvious issues that appeared while introducing a locking mechanism and bind/unbind cycles. The fixes are rather straightforward and can be reviewed/merged before the validation of PR #3642 is complete.

bardliao
bardliao previously approved these changes May 12, 2022
@plbossart
Copy link
Member Author

@bardliao @shumingfan can you please take a look at the latest commit for rt711*.c? I don't get what the component remove was trying to do with the regcache status.

@plbossart plbossart requested a review from bardliao May 12, 2022 22:49
@shumingfan
Copy link

@bardliao @shumingfan can you please take a look at the latest commit for rt711*.c? I don't get what the component remove was trying to do with the regcache status.

@plbossart Please check PR #2640. That's why to add the .remove callback function.

@plbossart
Copy link
Member Author

@bardliao @shumingfan can you please take a look at the latest commit for rt711*.c? I don't get what the component remove was trying to do with the regcache status.

@plbossart Please check PR #2640. That's why to add the .remove callback function.

ok, thanks for the pointer. I think the 'right' solution is rather to use pm_runtime_resume_and_get(), that way there's no risk of initiating transactions on a suspend bus.

@plbossart plbossart force-pushed the sdw/fixes-before-bind-unbind-lock branch from 8b527e9 to 5608c41 Compare May 17, 2022 21:42
@plbossart
Copy link
Member Author

@shumingfan @bardliao please check the update and specifically the last commit, I squashed the two commits related to .set_jack_detect() and added references to previous solutions in the commit message

dev_dbg(&rt711->slave->dev,
"%s hw_init not ready yet\n", __func__);
return 0;
ret = pm_runtime_resume_and_get(component->dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking maybe we only need to get when hs_jack != NULL. The reason that we need to access the register is we want to get the jack's status. But when hs_jack == NULL, we will disable the jack detection feature and we can write the register to cache. Then codec driver will sync it in the device resume callback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have multiple places where the state of the cache is modified, this should only happen on suspend and resume.

We can disable jack detection when the card is removed through, that's different to changing the cache settings.

For tests, it's rather common to disable the HDaudio links and codecs
in the build. Since we already get a codec_mask parameter indicating
that there are no codecs detected, it's straightforward to skip the
HDMI dailink creation and create a card.

Note that when disabling HDMI, a modified topology without HDMI
pipelines needs to be provided as well.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
The bus sdw_drv_remove() and sdw_drv_shutdown() helpers are used
conditionally, if the driver provides these routines.

These helpers already test if the driver provides a .remove or
.shutdown callback, so there's no harm in invoking the
sdw_drv_remove() and sdw_drv_shutdown() unconditionally.

In addition, the current code is imbalanced with
dev_pm_domain_attach() called from sdw_drv_probe(), but
dev_pm_domain_detach() called from sdw_drv_remove() only if the driver
provides a .remove callback.

Fixes: 9251345 ("soundwire: Add SoundWire bus type")
Signed-off-by: Pierre-Louis Bossart <[email protected]>
@plbossart plbossart force-pushed the sdw/fixes-before-bind-unbind-lock branch from f4e8191 to 917a1d8 Compare May 23, 2022 18:14
@plbossart
Copy link
Member Author

@fredoh9 @greg-intel @marc-hb

Still the same problem with USB Audio

Test https://sof-ci.01.org/linuxpr/PR3645/build155/devicetest failed on jf-cml-rvp-sdw-3

Test https://sof-ci.01.org/linuxpr/PR3645/build158/devicetest/ worked fine on jf-cml-rvp-sdw-3

Test https://sof-ci.01.org/linuxpr/PR3645/build166/devicetest/ failed on jf-cml-rvp-sdw-1

I cannot explain such differences by software changes due to this PR

Three whole days lost on this. Gah.

@marc-hb
Copy link
Collaborator

marc-hb commented May 25, 2022

Still the same problem with USB Audio

USB audio problem is not the problem in https://sof-ci.01.org/linuxpr/PR3645/build155/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-capture-10sec. It's only distracting your attention from the real problem:

2022-05-24 17:41:08 UTC [REMOTE_INFO] No SOF sound card found, exporting pipeline parameters from proc file system

@plbossart
Copy link
Member Author

Still the same problem with USB Audio

USB audio problem is not the problem in https://sof-ci.01.org/linuxpr/PR3645/build155/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-capture-10sec. It's only distracting your attention from the real problem:

2022-05-24 17:41:08 UTC [REMOTE_INFO] No SOF sound card found, exporting pipeline parameters from proc file system

not quite @marc-hb, why on earth would sof-test attempt to record from a non-sof card? Even if the SOF card disappeared, something is very very misleading in the script actions and reports.

@plbossart
Copy link
Member Author

@marc-hb take for example https://sof-ci.01.org/linuxpr/PR3645/build166/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-playback-10sec. This is reported as PASS but plays on USB Audio. Does this seem right to you?

@plbossart
Copy link
Member Author

plbossart commented May 25, 2022

I think the problem is in an earlier test that's shown as PASS:
https://sof-ci.01.org/linuxpr/PR3645/build166/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-playback-10sec

If you look at the logs there's a number of stack traces and I suspect the kmod load/unload causes all kinds of issues here.

@marc-hb EDIT: I think you meant https://sof-ci.01.org/linuxpr/PR3645/build166/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-sof-logger

@marc-hb
Copy link
Collaborator

marc-hb commented May 25, 2022

not quite @marc-hb, why on earth would sof-test attempt to record from a non-sof card?

Because of an sof-test bug that I started describing thesofproject/sof-test#471 (comment) and that should probably be filed independently.

Still, the lack of any SOF audio device is not an sof-test problem and IMHO the more pressing problem. sof-test is just being a very bad messenger here (what's new?)

@marc-hb
Copy link
Collaborator

marc-hb commented May 26, 2022

I think the problem is in an earlier test that's shown as PASS: https://sof-ci.01.org/linuxpr/PR3645/build166/devicetest/?model=CML_RVP_SDW_ZEPHYR&testcase=check-sof-logger

If you look at the logs there's a number of stack traces and I suspect the kmod load/unload causes all kinds of issues here.

Green failure tentatively fixed in one-line sof-test PR

RanderWang
RanderWang previously approved these changes May 26, 2022
@plbossart
Copy link
Member Author

SOFCI TEST

@plbossart
Copy link
Member Author

https://sof-ci.01.org/linuxpr/PR3645/build168/devicetest/ worked fine on jf-cml-rvp-sdw-3

We must have a random bug somewhere...

@plbossart
Copy link
Member Author

SOFCI TEST

@plbossart
Copy link
Member Author

https://sof-ci.01.org/linuxpr/PR3645/build169/devicetest worked fine on jf-cml-rvp-sdw-1

Difficult to figure out what is going on, I am not comfortable with 2/5 failures.

@plbossart
Copy link
Member Author

Internal tests show this warning

[ 6829.633278] kernel: rt711 sdw:0:025d:0711:00: in rt711_jack_init enable
[ 6829.633281] kernel: ------------[ cut here ]------------
[ 6829.633282] kernel: WARNING: CPU: 7 PID: 97778 at kernel/workqueue.c:1654 __queue_delayed_work+0x70/0x90
[ 6829.633288] kernel: Modules linked in: snd_sof_probes snd_soc_sof_sdw(+) snd_soc_intel_hda_dsp_common snd_sof_ipc_msg_injector snd_soc_intel_sof_maxim_common snd_sof_ipc_flood_test snd_usb_audio snd_sof_pci_intel_tgl snd_hda_codec_hdmi snd_sof_pci_intel_icl snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_pci_intel_apl snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_bdw snd_sof_acpi_intel_byt snd_sof_intel_atom snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_sof_acpi snd_sof snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_es8316 snd_soc_max98390 snd_soc_max98373_i2c snd_soc_max98373_sdw snd_soc_max98373 snd_soc_max98357a snd_soc_ts3a227e snd_soc_max98090 snd_soc_rt5682_sdw snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670 snd_soc_rt5660
[ 6829.633329] kernel:  snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rt1011 snd_soc_sdw_mockup snd_soc_rt715_sdca snd_soc_rt1316_sdw snd_soc_rt711_sdca regmap_sdw_mbq snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rl6231 snd_soc_rt711 snd_soc_rt700 regmap_sdw soundwire_bus snd_soc_rt298 snd_soc_rt286 snd_soc_rl6347a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_compress snd_pcm regmap_i2c fuse snd_ctl_led ledtrig_audio ax88179_178a usbnet wmi_bmof snd_usbmidi_lib x86_pkg_temp_thermal intel_powerclamp snd_seq_midi snd_seq_midi_event snd_rawmidi hid_multitouch snd_seq snd_seq_device snd_timer i915 snd i2c_algo_bit drm_buddy soundcore mei_me drm_dp_helper mei drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect processor_thermal_device_pci_legacy sysimgblt fb_sys_fops processor_thermal_device processor_thermal_rfim
[ 6829.633376] kernel: rt711 sdw:0:025d:0711:00: [rt711_sdw_write] 4d13 <= 0011
[ 6829.633377] kernel:  cfbcopyarea processor_thermal_mbox intel_pch_thermal ttm intel_soc_dts_iosf wmi int3403_thermal int340x_thermal_zone intel_pmc_core int3400_thermal acpi_thermal_rel squashfs drm drm_panel_orientation_quirks efivarfs intel_ishtp_hid psmouse intel_lpss_pci xhci_pci intel_ish_ipc intel_lpss i2c_hid_acpi intel_ishtp idma64 xhci_hcd mfd_core i2c_hid [last unloaded: snd_pcm]
[ 6829.633400] kernel: CPU: 7 PID: 97778 Comm: systemd-udevd Not tainted 5.18.0-rc2-pr3645-169-default-77336-g3785b7d121c4 #4
[ 6829.633403] kernel: Hardware name: Dell Inc. Latitude 9510/, BIOS 0.1.3 10/01/2019
[ 6829.633404] kernel: RIP: 0010:__queue_delayed_work+0x70/0x90
[ 6829.633407] kernel: Code: 00 48 01 c1 48 89 4a 60 83 ff 40 75 2c 4c 89 cf e9 15 54 07 00 e9 e0 f9 ff ff 0f 0b eb ca 0f 0b 48 81 7a 68 00 2a 89 bb 74 a8 <0f> 0b 48 8b 42 58 48 85 c0 74 a6 0f 0b eb a2 89 fe 4c 89 cf e9 e7
[ 6829.633409] kernel: RSP: 0018:ffff99f6423eb908 EFLAGS: 00010007
[ 6829.633411] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000000fa
[ 6829.633413] kernel: RDX: ffff976bc0fc5c80 RSI: ffff976b80058e00 RDI: 0000000000000040
[ 6829.633414] kernel: RBP: 0000000000000040 R08: 0000000000000000 R09: ffff976bc0fc5cd0
[ 6829.633415] kernel: R10: 0000000000000001 R11: 0000000001c02380 R12: ffff976bc0fc5c80
[ 6829.633416] kernel: R13: ffff976b80058e00 R14: 00000000000000fa R15: ffffffffc0ebd0e0
[ 6829.633418] kernel: FS:  00007f9aa83e1880(0000) GS:ffff976d24400000(0000) knlGS:0000000000000000
[ 6829.633419] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6829.633421] kernel: CR2: 000055e7ffef9000 CR3: 0000000148782005 CR4: 00000000003706e0
[ 6829.633422] kernel: Call Trace:
[ 6829.633424] kernel:  
[ 6829.633425] kernel:  mod_delayed_work_on+0x57/0x90
[ 6829.633432] kernel:  rt711_jack_init+0x11b/0x2c0 [snd_soc_rt711]
[ 6829.633437] kernel:  rt711_set_jack_detect+0x3b/0x90 [snd_soc_rt711]
[ 6829.633441] kernel:  snd_soc_component_set_jack+0x24/0x50 [snd_soc_core]
[ 6829.633458] kernel:  rt711_rtd_init+0x153/0x170 [snd_soc_sof_sdw]
[ 6829.633463] kernel:  ? soc_probe_component+0x2f/0x300 [snd_soc_core]
[ 6829.633478] kernel: rt711 sdw:0:025d:0711:00: [rt711_sdw_write] 4c13 <= 0021
[ 6829.633475] kernel:  snd_soc_link_init+0x1e/0x40 [snd_soc_core]
[ 6829.633488] kernel:  snd_soc_bind_card+0x631/0xce0 [snd_soc_core]
[ 6829.633497] kernel:  ? is_module_address+0x43/0x70
[ 6829.633506] kernel:  devm_snd_soc_register_card+0x43/0x80 [snd_soc_core]
[ 6829.633519] kernel:  mc_probe+0x8dd/0xdc0 [snd_soc_sof_sdw]
static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
				struct delayed_work *dwork, unsigned long delay)
{
	struct timer_list *timer = &dwork->timer;
	struct work_struct *work = &dwork->work;

	WARN_ON_ONCE(!wq);
	WARN_ON_FUNCTION_MISMATCH(timer->function, delayed_work_timer_fn); <<<< HERE???
	WARN_ON_ONCE(timer_pending(timer));
	WARN_ON_ONCE(!list_empty(&work->entry));

That hints at a bad initialization of the workqueue. Gah.

When binding/unbinding codec drivers, the following warnings are
thrown:

[ 107.266879] rt715-sdca sdw:3:025d:0714:01: Unbalanced pm_runtime_enable!
[  306.879700] rt711-sdca sdw:0:025d:0711:01: Unbalanced pm_runtime_enable!

Add a remove callback for all Realtek/Maxim SoundWire codecs and remove this
warning.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
In codec driver bind/unbind test, the following warning is thrown:

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
...
[  699.182495]  rt711_sdca_jack_init+0x1b/0x1d0 [snd_soc_rt711_sdca]
[  699.182498]  rt711_sdca_set_jack_detect+0x3b/0x90 [snd_soc_rt711_sdca]
[  699.182500]  snd_soc_component_set_jack+0x24/0x50 [snd_soc_core]

A quick check in the code shows that the 'calibrate_mutex' used by
this driver are not initialized at probe time. Moving the
initialization to the probe removes the issue.

BugLink: thesofproject#3644
Signed-off-by: Pierre-Louis Bossart <[email protected]>
If the card registration fails, typically because of deferred probes,
the device properties added for headset codecs are not removed, which
leads to kernel oopses in driver bind/unbind tests.

We already clean-up the device properties when the card is removed,
this code can be moved as a helper and called upon card registration
errors.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
Follow the same flow as rt711-sdca and initialize all mutexes at probe
time.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
Realtek headset codec drivers typically check if the card is
instantiated before proceeding with the jack detection.

The rt700, rt711 and rt711-sdca are however missing a check on the
card pointer, which can lead to NULL dereferences encountered in
driver bind/unbind tests.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
The workqueues are initialized in the io_init functions, which isn't
quite right. In some tests, this leads to warnings throw from
__queue_delayed_work()

WARN_ON_FUNCTION_MISMATCH(timer->function, delayed_work_timer_fn);

Move all the initializations to the probe functions.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
…etect

The .set_jack_detect() codec component callback is invoked during card
registration, which happens when the machine driver is probed.

The issue is that this callback can race with the bus suspend/resume,
and IO timeouts can happen. This can be reproduced very easily if the
machine driver is 'blacklisted' and manually probed after the bus
suspends. The bus and codec need to be re-initialized using pm_runtime
helpers.

Previous contributions tried to make sure accesses to the bus during
the .set_jack_detect() component callback only happen when the bus is
active. This was done by changing the regcache status on a component
remove. This is however a layering violation, the regcache status
should only be modified on device probe, suspend and resume. The
component probe/remove should not modify how the device regcache is
handled. This solution also didn't handle all the possible race
conditions, and the RT700 headset codec was not handled.

This patch tries to resume the codec device before handling the jack
initializations. In case the codec has not yet been initialized,
pm_runtime may not be enabled yet, so we don't squelch the -EACCES
error code and only stop the jack information. When the codec reports
as attached, the jack initialization will proceed as usual.

BugLink: thesofproject#3643
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Fixes: 899b125 ('ASoC: rt711: add snd_soc_component remove callback')
Signed-off-by: Pierre-Louis Bossart <[email protected]>
@plbossart plbossart dismissed stale reviews from RanderWang and bardliao via 2c68ce3 May 26, 2022 20:16
@plbossart plbossart force-pushed the sdw/fixes-before-bind-unbind-lock branch from 917a1d8 to 2c68ce3 Compare May 26, 2022 20:16
@plbossart
Copy link
Member Author

Running test 12997 to make sure there's no surprises.

@plbossart plbossart requested a review from bardliao June 1, 2022 18:05
@plbossart
Copy link
Member Author

Running test 12997 to make sure there's no surprises.

nothing bad detected in this test. @bardliao can you please re-check the changes?

@plbossart plbossart merged commit 7d6cc34 into thesofproject:topic/sof-dev Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants