Internationalizing Copy and Paste #1159
Replies: 3 comments 12 replies
-
It works under Debian, too: debian-paste-de-charset.webm |
Beta Was this translation helpful? Give feedback.
-
This looks interesting, thanks. One suggestions I would have is to compress layout information, by only specifying the keys that are different from the en-us layout. |
Beta Was this translation helpful? Give feedback.
-
I have made some progress and have a few preliminary results. In order to make this a bit more presentable I moved all code into a new repository, see: https://github.com/chschnell/v86-i18n, the interactive keyboard demo is available there as a github page, see: https://chschnell.github.io/v86-i18n/keyboard-tables/. I've added example code to the repository. The code has matured, I think I cannot make the representation of codepages and keyboard layouts any compacter than it is now, see codepage_tables.js and keyboard_tables.js. To summarize, I think this is what we learned so far:
So unless the limit of V86's ringbuffer gets lifted we will always run into an upper bound, it just depends on the amount of pasted text. I've also noticed that when I paste while the Firefox debugger is open (which slows V86 down considerably), scancode data will get lost. @SuperMaxusa noticed that some DOS/BIOS combinations may even be more constrained than for example FreeDOS/SeaBIOS which I use for tests (could you add which ones you tested?). |
Beta Was this translation helpful? Give feedback.
-
While working on graphical text mode I stumbled upon Copy and Paste which are still both tied to the en-US locale. It took a bit, but I think I here's how to internationalize these two functions.
Internationalizing Copy is relatively simply, but somewhere in the V86 settings the user will have to manually configure the 8-Bit codepage that is in use by the guest OS. I have published codepage tables here before.
Internationalizing Paste is a different beast though, and it is what this post is about.
I think the only proper way to implement a Paste function is by emulating key strokes on the keyboard, as is already the case in KeyboardAdapter.simulate_char() but without support for non-US keyboards. What I believe is missing is a "keyboard charset", that is the set of Unicode codepoints that can be generated by a specific keyboard (without resorting to Alt+3*Digit) and the mapping of scancodes (keystrokes) that produce them.
The website https://kbdlayout.info/ documents 217 different keyboard layouts in various formats, amongst them are KLC text files. A KLC text file is the output of Microsoft's Keyboard Layout Creator (MSKLC) encoded in Unicode, for example see the KLC code of the US-keyboard. Unfortunately, the KLC file format is not documented, but it is practically self-explaining. It's possible to reliably generate the "keyboard charset" of any keyboard given its KLC file.
So I wrote a pyhton script import_kbd.py that downloads a given set of KLC files, parses and extracts the keyboard charset, and then transforms that into a Javascript representation, see keyboard_tables.js for the resulting output (contains mappings for 16 of the 217 known keyboards).
As that turned out to be slightly dense, I wrote a little HTML page that visualizes these mappings for debugging. On the top you can select one out of 16 keyboards (could be extended to all 217 keyboards or any subset of it except for CJK), on the left you see the repertoire of all typeable Unicode symbols and their scancode-mappings, and on the right you can enter some Unicode text into the top textarea that is transformed into the full scancode-sequence (including key-press and -release scancodes as well as handling dead keys) shown in the textarea below.
See method
Keyboard.text_to_scancodes(text)
in keyboard_lib.js for the transformation from a Unicode text to its PC scancode sequence including modifer keys (Shift, Ctrl, Alt, AltGr), it's straight-forward.Integrating this into V86 simply means to pump the produced scancode sequence into the V86 bus using
keyboard-code
events.I included this in my V86 demo, just select
Window -> Paste
from the menu. Here's a screenshot of the 149 (minus a few) German keyboard codepoints that I pasted into EDIT.EXE under guest OS FreeDOS:I've pasted whole source code files into FreeDOS, it's stable and I believe that already says a lot, yet I still have to do a lot of testing under other OSes.
Like manually configuring the active codepage for Copy, a user would have to manually configure the active keyboard for Paste.
It's noteworthy that emulating keystrokes in this way does not require any knowledge of the codepage used by the guest OS, it's a straight Unicode-to-Scancode mapping.
As keyboard layouts are defined by international standards, all OSes should treat them alike (so it shouldn't matter that I source my infos from Windows). Yet, nothing is fixed, OSes typically support user-defined keyboard layouts, so all that can be supported here is constrained to international standards.
EDIT: Updated links.
EDIT 2: Updated links again.
Beta Was this translation helpful? Give feedback.
All reactions