Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][MTL] HD-Audio playback IO error on Speakers #7990

Closed
syedk008 opened this issue Jul 29, 2023 · 31 comments
Closed

[BUG][MTL] HD-Audio playback IO error on Speakers #7990

syedk008 opened this issue Jul 29, 2023 · 31 comments
Assignees
Labels
bug Something isn't working as expected clock Issues related to clock settings Fix Submitted Fix has been submitted, to be fully verified HDA Applies to HD-Audio bus for codec connection IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform mtl-005 P1 Blocker bugs or important features
Milestone

Comments

@syedk008
Copy link

syedk008 commented Jul 29, 2023

Describe the bug
Playback a wave to speaker with aplay couple of time, see io error in terminal immediately, IPC timed out in dmesg.

intel@ubuntu:~$ sudo aplay -Dhw:0,0 -f dat -c 2 -i -vvv /dev/zero
Playing raw data '/dev/zero' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Hardware PCM card 0 'sof-hda-dsp' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 16384
period_size : 4096
period_time : 85333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 4096
period_event : 0
start_threshold : 16384
stop_threshold : 16384
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Max peak (8192 samples): 0x00000000 # 0%
Max peak (8192 samples): 0x00000000 # 0%
Max peak (8192 samples): 0x00000000 # 0%
Max peak (8192 samples): 0x00000000 # 0%
Max peak (480 samples): 0x00000000 # 0%
Max peak (1152 samples): 0x00000000 # 0%
aplay: pcm_write:2127: write error: Input/output error

To Reproduce
simple do aplay to speaker pipeline:
aplay -Dhw:0,0 -f dat -c 2 -i -vvv /dev/zero

Reproduction Rate
100%

Expected behavior
Playback without errors.

Impact
showstopper Linux BKC

Environment
SoC stepping: MTL-P C0
Audio Configuration: HDA
Audio card: Gen4.1 AIOC
codecs: ALC256
SoF-FW: production signed SOF FW ( MTL_PR4A_C0NN-XODCA_CPSF_SEP0_0175049A_2023WW26.2.1.bin)
Topology: From MTL-005.0.3 2023WW26 Release sof-hda-generic-4ch.tplg
IFWI : orange Release IFWI (production signed) WW29.1 BKC - ( MTL_PR4A_C0NN-XODCA_CPSF_SEP0_0175049A_2023WW26.2.1.bin)

Kernel: https://github.com/thesofproject/linux/commits/topic/sof-dev commit:f77b5931c6223d395bcc666a2cec1846c0a0ec25
Prebuilt deb: https://ubit-artifactory-ba.intel.com/artifactory/ccg_linuxpc-ba-local/CI/custom-builds/mtl-h-USER_BUILDS-377/os/kernel-debs/

Bios settings:
Go to Intel Advanced Menu->PCH-IO configuration -> HD Audio configuration
• Enable HDA link
• Enable SDI0
• Disable SDI1
• Enable DMIC0 with ClkA (BIOS default)
• Enable DMIC1 with ClkA (BIOS default)
• Disable SNDW#0, SNDW#1, SNDW#2, SNDW#3 (make sure all other SNDW link are disabled)
• Disable SSP#0, SSP#1, SSP#2, SSP#3
– go to HD Audio DSP feature Configuration settings and make sure

  • DMIC Stereo option is disabled.
  • DMIC Quad option is enabled.
    • Intel Advnaced Menu - PCH IO configuration - HD Audio configuration - HD Audio DSP feature
    configuration - Clock source - 38.4 MHz

Screenshots or console output
[ 127.161590] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x13000003|0x0
[ 127.169145] sof-audio-pci-intel-mtl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[ 127.169150] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 127.177691] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x93000003|0x0|0x0, target: 0x33000000|0x0|0x0, ctl: 0x3
[ 127.189025] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 127.197241] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 127.205620] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[ 127.211338] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 127.219038] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM status: 0x0, ROM error: 0x0
[ 127.226475] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM debug status: 0x50000005, ROM debug error: 0x0
[ 127.235550] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM feature bit enabled
[ 127.242292] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 127.250508] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to pause all pipelines
[ 127.257773] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 127.267534] HDA Analog: ASoC: trigger FE cmd: 0 failed: -110
[ 127.273296] snd_sof:sof_pcm_hw_free: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: free stream 0 dir 0
[ 127.273312] snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-mtl 0000:00:1f.3: FW Poll Status: reg[0x1c0]=0x140000 successful
[ 127.273320] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-mtl 0000:00:1f.3: trigger cmd: 0 state: 2
[ 127.273337] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-mtl 0000:00:1f.3: unbind modules copier.host.1.1:0 -> gain.1.1:0
[ 127.273341] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-mtl 0000:00:1f.3: widget copier.host.1.1 freed
[ 127.273344] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-mtl 0000:00:1f.3: unbind modules gain.1.1:0 -> mixin.1.1:0
[ 127.273348] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-mtl 0000:00:1f.3: widget gain.1.1 freed
[ 127.273351] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-mtl 0000:00:1f.3: unbind modules mixin.1.1:0 -> mixout.2.1:0
[ 127.273354] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx : 0x46000002|0x3: MOD_UNBIND
[ 127.777591] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x46000002|0x3
[ 127.785159] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[ 127.790936] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to unbind modules mixin.1.1:0 -> mixout.2.1:0
[ 127.800197] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx : 0x12000000|0x0: GLB_DELETE_PIPELINE
[ 128.305387] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x12000000|0x0
[ 128.312946] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[ 128.318695] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.1
[ 128.327001] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-mtl 0000:00:1f.3: unbind modules mixout.2.1:0 -> gain.2.1:0
[ 128.327006] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-mtl 0000:00:1f.3: widget mixout.2.1 freed
[ 128.327009] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-mtl 0000:00:1f.3: unbind modules gain.2.1:0 -> copier.HDA.2.1:0
[ 128.327012] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-mtl 0000:00:1f.3: widget gain.2.1 freed
[ 128.327017] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx : 0x12010000|0x0: GLB_DELETE_PIPELINE
[ 128.833606] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x12010000|0x0
[ 128.841155] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[ 128.846891] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.2
[ 128.855196] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets
[ 128.862726] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -110
[ 128.873319] snd_sof:sof_pcm_close: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: close stream 0 dir 0

@syedk008 syedk008 added the bug Something isn't working as expected label Jul 29, 2023
@mengdonglin mengdonglin added MTL Applies to Meteor Lake platform HDA Applies to HD-Audio bus for codec connection IPC timeout IPC timeout observed P1 Blocker bugs or important features mtl-005 labels Jul 29, 2023
@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Jul 31, 2023

I tried the attached kernel package Prebuilt deb: https://ubit-artifactory-ba.intel.com/artifactory/ccg_linuxpc-ba-local/CI/custom-builds/mtl-h-USER_BUILDS-377/os/kernel-debs/, but failed to reproduce the issue on my MTL-DDR5 RVP.
@syedk008

@plbossart
Copy link
Member

@syedk008 you have cut the dmesg log, we need to see the sequence before the IPC timeout. Also please attach files instead of pasting in GitHub text windows, it's problematic to scroll down.

@syedk008
Copy link
Author

syedk008 commented Aug 1, 2023

dmesg_ipc_error.txt
Here is the complete log..

@mengdonglin
Copy link
Collaborator

@syedk008 @plbossart We cannot reproduce this issue on a MTL-P LP5 RVP locally. This issue seems very device-specific. Please find more info in mail.

Since 6.5-rc2 upstream kernel (fdf0eaf11452) can always reproduce this issue but 6.4-1 upstream kernel (627d858674f5) cannot on that specific LP5 RVP, I think a kernel-bisect on it can help us to narrow down the issue.

@plbossart
Copy link
Member

@syedk008 yes please bisect and get back to us with the commit that broke your device.

@lgirdwood lgirdwood added this to the v2.7 milestone Aug 2, 2023
@syedk008
Copy link
Author

syedk008 commented Aug 3, 2023

@plbossart @mengdonglin
Reverting the below patch seems working.
d8a2c9879349 ASoC: SOF: ipc4-loader/topology: Query the CPC value from manifest.

@ujfalusi
Copy link
Contributor

ujfalusi commented Aug 3, 2023

Can you share the firmware log as well?
That patch is only taking the CPC values form the firmware manifest if present and sends it to firmware.
So the only thing it does is instead of cpc=0, it will send a value which was provided in the firmware file.

@aiChaoSONG
Copy link
Collaborator

aiChaoSONG commented Aug 3, 2023

When parsing CPC value from extended manifest, force to assign 0 to basecfg->cpc, issue disappeared. https://github.com/thesofproject/linux/blob/9455b74d027fb66c90a44c5b217ea5f01df5328d/sound/soc/sof/ipc4-loader.c#L466

@ujfalusi You are right, there is a bug in CPC handling in firmware.

@ujfalusi
Copy link
Contributor

ujfalusi commented Aug 3, 2023

Draft PR to ignore the CPC value on the firmware side: #8007 for testing

Chao: Confirmed that with this PR and mtl-005-drop-stable branch, issue could not be reporduced.

@ujfalusi
Copy link
Contributor

ujfalusi commented Aug 3, 2023

@aiChaoSONG, thank you form the confirmation!

@mengdonglin mengdonglin added the clock Issues related to clock settings label Aug 3, 2023
@syedk008
Copy link
Author

syedk008 commented Aug 3, 2023

@ujfalusi Curious to know, why this bug is affecting one specific device?

@mengdonglin
Copy link
Collaborator

@syedk008 has confirmed this issue cannot be reproduced with test PR #8007

@mengdonglin
Copy link
Collaborator

mengdonglin commented Aug 4, 2023

@ujfalusi Curious to know, why this bug is affecting one specific device?

@syedk008 Hard to say. It seems different device/silicon may have slightly timing difference that can expose some issue. Let's wait for the final fix that may give us more clue.

We previously also met a valid clock bug only observed on 1 device. We're lucky it's you not customers observe this issue at first :)

@aiChaoSONG
Copy link
Collaborator

Do aplay to HDA Analog twice, IO error happened at second playback with 100% rate.
mtrace on SH RVP, issue not reproduced:
hda_mtrace_no_reproduce.txt
mtrace on IN RVP, issue reproduced:
hda_mtrace_reproduce.txt

@abonislawski
Copy link
Member

Reverting the below patch seems working.
d8a2c9879349 ASoC: SOF: ipc4-loader/topology: Query the CPC value from manifest.

There is driver fix for another CPC issue and I can see wrong CPC values in log files above (hda_mtrace_no_reproduce, hda_mtrace_reproduce).
Probably it will NOT help in this case but anyway its worth to test correct configuration with:
thesofproject/linux#4508

@syedk008
Copy link
Author

I'm able to reproduce the issue in different usecase even after reverting this patch. arecord on one terminal and aplay on another, playback and record stops with io error.

@ujfalusi
Copy link
Contributor

ujfalusi commented Aug 11, 2023

@syedk008, you are saying that you can reproduce the issue even if you revert the kernel patch to send non 0 CPC value?
Can you share the mtrace log?

@abonislawski
Copy link
Member

This would align with my test results where I can pass easily several iterations before first context restore (with CPC patches).
It fails only when we wait a moment to trigger D3 flow, so with CONFIG_ADSP_IMR_CONTEXT_SAVE=n it will work on any clock with CPC.

@aiChaoSONG
Copy link
Collaborator

With #8019 merged to main, issue cannot be reproduced with sof-dev kernel + firmware (main + #8019)

@mengdonglin
Copy link
Collaborator

With #8019 merged to main, issue cannot be reproduced with sof-dev kernel + firmware (main + #8019)

@abonislawski @syedk008 FYI.

Hi @abonislawski , could you help double check if you can have the same test result as @aiChaoSONG? If the test results is yes, the fix may already be in somewhere of SOF or Zephyr main branch.

@mengdonglin
Copy link
Collaborator

This issue seems quite similar to #8028, maybe a duplicate.

@abonislawski
Copy link
Member

abonislawski commented Aug 17, 2023

Hi @abonislawski , could you help double check if you can have the same test result as @aiChaoSONG? If the test results is yes, the fix may already be in somewhere of SOF or Zephyr main branch.

Unfortunately it works only because of missing rimage manifest data on main (so it stays on highest clock).
With a proper manifest it will switch to freq idx 1 and fail

@ujfalusi
Copy link
Contributor

Hi @abonislawski , could you help double check if you can have the same test result as @aiChaoSONG? If the test results is yes, the fix may already be in somewhere of SOF or Zephyr main branch.

Unfortunately it works only because of missing rimage manifest data on main (so it stays on highest clock). With a proper manifest it will switch to freq idx 1 and fail

Good catch :(

@mengdonglin
Copy link
Collaborator

@syedk008 @abonislawski @aiChaoSONG FYI. To provide a workaround before this issue is fixed, kernel driver introduce a module parameter to ignore the CPC value:
thesofproject/linux#4531

@mengdonglin
Copy link
Collaborator

mengdonglin commented Aug 23, 2023

@mengdonglin mengdonglin modified the milestones: v2.7, v2.8 Aug 30, 2023
@mengdonglin
Copy link
Collaborator

@alex-cri @kv2019i I push it out to v2.8 as the CPC-based clock selection is out of scope of v2.7

@wszypelt
Copy link

@syedk008 Does the error still reproduce after implementing workaround?

@abonislawski
Copy link
Member

@syedk008 can you test this with MTL v6.2 release?

@keqiaozhang
Copy link
Collaborator

@syedk008 , please help to check this issue with MTL V6.2 release.

@lgirdwood
Copy link
Member

@syedk008 @keqiaozhang good to close ?

@keqiaozhang
Copy link
Collaborator

@syedk008 let's close this bug. Please re-open it if it's still a problem to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected clock Issues related to clock settings Fix Submitted Fix has been submitted, to be fully verified HDA Applies to HD-Audio bus for codec connection IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform mtl-005 P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

9 participants