-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark framebuffer as non-cacheable #74
Comments
Ahhh nice i didn't knew it
:D
O.o
XD that's a lot XD Nice ^^
Won't help. That's only in hardware
For the L1, we realy need it to be cachable, else it would be very very slow. The main issue is that the litex video dma is directly connected to the litedram controller, instead of passing by the L2 cache for snoops. So 2 solutions (which would need dev) :
I would say the second option is one which would work for sure, and would probably not be toooo hard to implement. |
I've actually tried several and all of them played relatively fine. But I have no sound device, so not sure if it will be fast enough to play videos with sound. Also it plays on full speed only in native video size of 320x240, upscaling it to screen size kills the performance. I'm attaching a small cut from Blender open movie Sintel as an example. sintel_240p_short.mp4
So there is no capability to flush L2 now? If such capability is introduced, it will be quite easy to patch fb driver to flush changed cache lines periodically or on each fb write. Moving fb DMA from separate LiteDRAM port to system bus is not that great I think. Apart from need to ignore caching for DMA, it will also put quite large load (115MB/s for 800x600) on system bus and L2 without real benefit. Also I would like to note a couple of things more:
|
BTW 4 cores also work without issues, but such config rarely achieves decent timings for my fpga with 175MHz system clock. With 4 cores@175MHz and 2G RAM even firefox becomes usable enough for me to write this comment directly from it, although it's definitely not very pleasant user experience :). I've used 7-zip and Linpack (from hpcc Debian package) to benchmark multicore integer & float performance. With 620 7z decompression MIPS and 130 Linpack Mflops this system (LiteX with 4 NaxRiscv cores@175MHz) is roughly on par with single core A9@800MHz used in Zynq 7000s chips. Great result! |
Right.
I was thinking more about a fully hardware solution, where the cache periodicaly scrub itself from dirty cache lines.
I realy have no idea, i never tested more than 32 physical address space. Likely something will break somewere XD
Ahh i didn't knew that. But how opensbi knows where is the device tree ? Is that something happening while linux boot via the software SBI interface ? or durring opensbi boot ?
Nice :D
It seems that the 1 MB of l2 cache realy help, when i tested it was painfull (128 KB L2, dual core, 100 Mhz) |
Is there a reason to do it besides framebuffer? For fb to look nice you'll need to writeback cache every 20 ms or so, it will add some memory load, especially in case of larger L2. On the other hand if you'll implement some solution to trigger forced L2 cache line writeback from software (cbo.clean instruction from Zicbom for example) it wont hurt anything and will be generally more useful I think. Is it hard to do?
I'll try it and report if I'll have any success then :).
Probably from here?
Not only 1MB L2, but also higher frequency and 2 more cores as Firefox seems to be parallel enough to fully utilize all of them. And don't get me wrong, it's still very slow :). It takes a couple of minutes to start, ~15-20 seconds to render typical static Wikipedia page and ~40 seconds to render this one (and still scrolling is not smooth). But it works and I was not expecting a modern full featured browser to be even remotely usable on 175 MHz CPU without DRI. |
First of all, thanks for the great work with NaxRiscv! With your howto I was able to launch Debian on latest LiteX with 175 MHz dual core NaxRiscv and 1MB L2 cache on Alinx AXKU040 board (Kintex Ultrascale). I could confirm that everything including X is stable and works just fine: Xfce with mouse & keyboard over usbip (my board has no USB) is entirely usable, OpenTTD is playable at 800x600 and mplayer can play 240p H264 videos without slowdown. This is quite an achievement for soft CPU!
I observe only one small problem. On small framebuffer updates glitches appear around updated shapes for several seconds as shapes are not fully redrawn. This is most noticeable with single character output in console framebuffer or mouse pointer moves in X. I think that this problem comes from framebuffer updates staying in NaxRiscv L2 cache as glitches become much more prominent with larger caches (I've increase L2 to 1MB) and during system idle. If CPU is working hard, mouse pointer moves without glitches, probably because L2 is getting rewritten fast, but when CPU is idle, glitches can stay up to ~10 seconds with larger caches.
So my question is: is there a way to mark framebuffer memory as non cacheable? DMA buffers, including framebuffer, should either bypass L2 or flush it after each write. I've tried to add "no-map" attribute to framebuffer@40c00000 in dts (found it in some Xilinx doc), but it seems to change nothing. May be there is some dts or driver patch that could fix this issue?
This is the command I've used to build bitstream:
python3 -m litex_boards.targets.alinx_axku040.py --cpu-type=naxriscv --bus-standard axi-lite --with-video-framebuffer --with-coherent-dma --with-sdcard --with-ethernet --l2-bytes 1048576 --xlen=64 --scala-args='rvc=true,rvf=true,rvd=true,alu-count=2,decode-count=2' --with-jtag-tap --sys-clk-freq 175e6 --cpu-count 2 --l2-size 0 --build
alinx_axku040.py is written by me, but it's fully based on xilinx_kcu105.py with addition of framebuffer like in digilent_nexys4.py for example.
The text was updated successfully, but these errors were encountered: