forked from ThomasDickey/shuford-terminals
-
Notifications
You must be signed in to change notification settings - Fork 0
/
character_set_news.txt
5965 lines (4669 loc) · 227 KB
/
character_set_news.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Character Set News
//////////////////////////////////////////////////////////////////////////////
The culture of computer technology started out primarily speaking the English
language, but, as computers become more a part of everyday life, computer
technology has been forced to pay attention to the diverse languages that
humans around the globe speak and write.
Most of these languages need an alphabet bigger than what ASCII provides.
Therefore, programmers have learned* the painstaking techniques of supporting
diverse character sets in their software. Thus we have Unicode, ISO-8859-1,
UTF-8, etc.
Other variations in computer character sets occur because vendors have wished
to support the drawing of lines and simple graphic images using character-cell
glyphs. With GUI environments now deployed everywhere, such uses of different
character sets are less common than they once were, but the issue still
sometimes arises.
This file contains a varying collection of notes and folklore on how to
set up and use different character sets.
..............................................................................
* Although some programmers, such as the inventors of PHP, ignored
the issue as long as possible.
//////////////////////////////////////////////////////////////////////////////
A useful book for such matters is
"Creating WorldWide Software: Solaris International Developer's Guide"
(2nd. ed.) by Bill Tuthill and David Smallberg
Sun Microsystems Press, 1997, ISBN 0-13-494493-3, 382 pages, $65 US.
http://vig.prenhall.com/catalog/academic/product?ISBN=0134944933
The book treats Unix, CDE, Motif, X11, but these apply to Linux also.
Topics include:
* establishing locale environments
* encoding character sets
* displaying localized text
* messaging for program translation
* handling language input
* localizing software after internationalization
* gettext() vs. catgets()
* discussion of EUC, Unicode, ISO-8859-X
* diagrams of hand gestures that could be obscene in some cultures
* how the mysterious abbreviations "I18N" and "L10N" were formed :-)
Sun Microsystems Press/Prentice-Hall Professional, Technical, and Reference
1 Lake Street
Upper Saddle River, NJ 07458 USA
Web: http://www.prenhall.com/
US Orders: 1 800/282-0693
Fax: +1 201/236-7141
US Bulk: 1 800/382-3419
UK Fax: +44 1279 414130
AU Fax: +61 02 9453 0117
SG Fax: +65 378 0370
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.dcom.telecom
Path: cs.utk.edu!emory!europa.eng.gtefsd.com!howland.reston.ans.net
!spool.mu.edu!telecom-request
Date: 29 Oct 1993 12:20 -0600
Message-ID: <[email protected]>
X-Telecom-Digest: Volume 13, Issue 725, Message 1 of 8
From: Rob Slade <[email protected]>
Subject: Book Review: "The Unicode Standard"
BKUNICOD.RVW 980921
Addison-Wesley Publishing Co.
P.O. Box 520
26 Prince Andrew Place
Don Mills, Ontario M3C 2T8
416-447-5101 fax: 416-443-0948
or
1 Jacob Way
Reading, MA 01867-9984
800-527-5210 617-944-3700
or
5851 Guion Road
Indianapolis, IN 46254
800-447-2226
or
Unicode, Inc.
1965 Charleston Road
Mountain View, CA 94043
(415) 961-4189 Fax: (415) 966-1637
"The Unicode Standard", U$32.95/C$42.95
In the dim and distant past, the late (and generally unlamented) SUZY
Information System was born in Vancouver. Rather an oddball as far as
online services went, one "feature" was that the programmer had tried
to allow for the use of all of the IBM graphics characters. This led
to an entirely new field of "smiley" or "emoticon" (emotional icon)
endeavours. Instead of the usual sideways happy face of the colon,
hyphen and right parenthesis; ":-)"; we were able to use the "Ctrl-A"
alternative of the IBM PC character set. Having a decimal value of
one, this character is an upright happy face. This allowed other
expansions, such as Ctrl-A and the right square bracket, which looks
like a face and a telephone handset, and was used (usually in the
"chat" modes) for "I am on the phone."
"How nice," I hear you mutter between clenched teeth. "Can we now get
on with the review?" Patience, stout nerds. This *is* the review.
As SUZY users, particularly those who had been introduced to computer
communications on the system, moved on to other services or local
bulletin boards, they were usually quite shocked to find that their
favourite symbols no longer worked. The little diamond (Ctrl-C) would
kill a message on a VAX. Fidonet users might find that the cute
tagline they had formed from graphics characters completely
disappeared when they sent the message through an Internet gateway.
ASCII (the American Standard Code for Information Interchange) is
widely, and mistakenly, believed to define two hundred and fifty-six
characters.
It doesn't.
Furthermore, of the hundred and twenty-eight characters it does
define, many are "control" rather than printable characters. (The
"card suit" symbols on the IBM PC graphics set are defined as "end of
text", "end of transmission", "enquiry" and "acknowledgement" under
the real ASCII standard.) In addition, many believe ASCII to be a
universal standard; also not true. An octet with the decimal value
thirty-five, for example, is the number sign (sometimes called an
"octothorpe") in the United States, but a pound sign (the British
currency) in Britain. As with most fields of computer endeavor, the
nice thing about standards is that there are so many to choose from.
Many vary only slightly--but they vary.
The point is that there are a number of symbols which we commonly
know, but which cannot be consistently displayed on terminals or
printers. Certain terminals will have certain "international"
character sets, but not all are identical. Accents and other phonetic
modifiers may be difficult to handle: entire character sets are given
over strictly to accented characters. (In Canada we are acutely aware
of the problems, with "French" keyboards used at many sites. On one,
I was having difficulty finding some necessary punctuation marks for
network addressing, and asked a Francophone programmer for help. "Who
knows," he growled, "I never use the ____ things!")
Unicode seeks to address this problem. Including not only the
variations on the Latin alphabet, Unicode incorporates Greek,
Cyrillic, Hebrew and other alphabets. It also includes punctuation,
diacriticals, mathematical and scientific symbols and miscellaneous
graphics. Asian ideographs are also assigned codes. This is no
longer suitable, of course, for a seven-bit code, and Unicode is based
on a sixteen-bit address space.
The book gives some background and plans (chapter one), general
principles and rules for conformance (chapter two). To comment on
these in any meaningful way would be to rewrite these chapters. This
is technical material, though not the same technology that computer
types are used to. Some background study in linguistics would be a
good idea, although it is not strictly necessary to understand and use
the Unicode standard. There are, however, a wealth of symbols,
punctuation marks and typesetting codes which Unicode gives
standardized access to. On the other hand, any application which used
the standard in a significant way would likely require a linguistics
background in any case.
The bulk of the books (two volumes) is, of course, taken up with the
actual code charts. (Volume two, in fact, is almost completely
concerned with Han ideographs. In spite of the recent widespread use
of the English alphabet, this is still the standard written language
of Chinese, Japanese and Korean: CJK in Unicode terminology.) The
charts are augmented with verbal definitions of the symbols, and with
cross references to similar forms.
The Unicode standard is recent. In comparative terms its current [1993]
usage is negligible. However, it is the defacto standard for broadly
based international character sets. With the recent rejection of the
proposed ISO thirty-two bit standard, and the recasting of that
standard to follow Unicode's lead, Unicode is a significant factor in
the development of any international applications.
copyright Robert M. Slade, 1993 BKUNICOD.RVW 980921
(Postscriptum - Unicode Inc. maintains an FTP site at unicode.org
(192.195.185.2). Some of the mapping tables, and the Han cross
reference lists are available. Some tables are also available on IBM
PC or Mac compatible floppy disks.)
http://www.unicode.org/
Permission granted to distribute only with unedited copies of TELECOM
Digest and associated newsgroups/mailing lists.
DECUS Canada Communications, Desktop, Education and Security group newsletters
Editor and/or reviewer [email protected], [email protected], Rob Slade at 1:153/733
DECUS Symposium '94, Vancouver, BC, Mar 1-3, 1994, contact: [email protected]
..............................................................................
..............................................................................
An older introductory book on this subject is
"Coded Character Sets: History and Development" by C. E. MacKenzie.
Reading: Addison-Wesley, 1980.
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.terminals
References: <[email protected]>
Message-ID: <[email protected]>
Organization: Columbia University
Date: 15 Mar 2000 19:33:51 GMT
From: Jeffrey Altman <[email protected]>
Subject: Re: change from ASCII to ANSI character set in DOS window
In article <[email protected]>, <[email protected]> wrote:
: Hi,
:
: I am running NT 4.0 SP3. My DOS window currently is displaying
: the ASCII character set. However I want it to display the ANSI
: character set. How do I do this?
The Console window is Unicode based. The font that is displayed
is Unicode if you are using a TrueType font such as LucidaConsole or
Code Page based (CP437, CP850, ...) if you are using raster fonts.
The console application has a choice of writing to the screen
using the active Code Page or Unicode. NT provides the proper
translations.
CP1252 is the Windows variation of ISO-Latin1 that you refer to as ANSI.
To use this code page in your application, use SetConsoleCP() and
SetConsoleOutputCP().
--
Jeffrey Altman * Sr.Software Designer * Kermit-95 for Win32 and OS/2
The Kermit Project * Columbia University
612 West 115th St #716 * New York, NY * 10025
http://www.kermit-project.org/k95.html * <[email protected]>
..............................................................................
Message-ID: <[email protected]>
References: <[email protected]>
<Pine.OSF.4.21.0010032041570.169204-100000@goedel2.math.washington.edu>
<Pine.PCP.3.91.1001004233418.1498B-100000@[204.111.21.127]>
<Pine.PCP.3.91.1001005115632.1498A-100000@[204.111.54.33]>
NNTP-Posting-Host: mail.pharmapartners.nl
Newsgroups: comp.mail.pine
Date: 6 Oct 2000 07:57:19 GMT
From: Villy Kruse <[email protected]>
Subject: Re: Pine and French characters
On Thu, 5 Oct 2000 13:29:02 -0400, Gopi Sundaram <[email protected]> wrote:
>On Thu, 5 Oct 2000, Samuel W. Heywood wrote:
>
>> If the character set used in Windows is not backward-compatible
>> with DOS, then Windows does not adhere to the standard.
>
>I don't know what standards you are talking about, but I'm glad that
>Windows finally used the ISO standard, whereas DOS didn't.
Well, actualy Windows tries to "improve" on iso-8859-1 and calls that
windows-1252. The difference is that some values in the range 0x80
to 0x9f has been assigned to characters, which are missing in iso-8859-1
For example the euro sign is 0x80 in win1252, but doesn't exist in
iso-8859-1. However it will be 0xA4 in iso-8859-15 aka latin-9.
Check the alphabet soup at
http://www.czyborra.com/
and see how standard the
various standards really are.
Villy
..............................................................................
Newsgroups: comp.mail.pine
Message-ID: <[email protected]>
References:
<Pine.OSF.4.21.0010032041570.169204-100000@goedel2.math.washington.edu>
<Pine.PCP.3.91.1001004233418.1498B-100000@[204.111.21.127]>
<Pine.PCP.3.91.1001005115632.1498A-100000@[204.111.54.33]>
<Pine.PCP.3.91.1001005170824.1498D-100000@[204.111.24.209]>
<Pine.PCP.3.91.1001007204737.1498A-100000@[204.111.54.179]>
Date: Mon, 9 Oct 2000 12:54:19 +0200
Organization: Knights of the Round Tuit
From: "Alan J. Flavell" <[email protected]>
Subject: Re: Pine and French characters
On Sat, 7 Oct 2000, Samuel W. Heywood wrote:
> Thanks a lot for the URL. Now that I've read about QUOTED PRINTABLE I
> understand that it probably would be best to load a code page for
> handling this ISO-8859-1 character set.
Excuse me but you're not quite with us yet.
You would certainly be advised to load a code page that covers the
Latin-1 repertoire; but the recommendation would be to load the cp850
code page, which covers this repertoire but it's _not_ the iso-8859-1
character coding itself. PINE knows how to mediate between the two,
as we've already covered in this thread.
There is a relatively obscure code page, cp819, which represents the
iso-8859-1 character coding. However, if you load it, you are going
to find quite a number of conventional DOS applications displaying
bizarre characters in their menus etc, instead of the DOS "box
drawing" characters which they expected.
You'll find some brief (and old) notes of mine here
http://ppewww.ph.gla.ac.uk/~flavell/iso8859/iso8859-pointers.html#cp819
but I don't recommend that. Unless you have some special requirement
that we haven't discussed here, I recommend that you use cp850.
cheers
Alan
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.unix.solaris
NNTP-Posting-Host: polaris.nada.kth.se
NNTP-Posting-Date: Tue, 22 Mar 2005 21:00:45 +0000 (UTC)
References: <[email protected]>
Message-ID: <[email protected]>
Organization: Dept of Numerical Analysis and Computer Science, KTH
Date: Tue, 22 Mar 2005 22:00:35 +0100
From: Mårten Svantesson <[email protected]>
Subject: Re: UNIX - Locale - conversion:char to UNICODE -
glitch:code page is different !
[email protected] (Neel) writes:
>
> Hi,
>
> As subject says, I want to convert string from char to unicode but
> with differnt code page !
>
> I wanted to know as we have mbstowcs in windows, do we have any call
> which will convert char to UNICODE in UNIX/Solaris/Linux ?
>
> Additionally does this take care of differnet code pages too ? For
> instance if the string which needs to be converted is from different
> code page than current code page !
No problem. Though in Solaris I've never seen the term
"code page". I would think that you mean "code set".
Anyway, the functions you are looking for are
iconv_open
iconv
and iconv_close
The man pages iconv(3C) (contains a decent example) and
iconv_unicode(5) should get you started. (That is, you
execute "man -s 3c iconv" and "man -s 5 iconv_unicode".)
The functions are standardised in UNIX98 and should work in other
unices as well.
--
- Mårten
mail: [email protected] *** ICQ: 4356928 *** mobile: +46 (0)707390385
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.unix.solaris
Message-ID: <[email protected]>
References: <[email protected]>
Date: Sat, 29 Mar 2003 11:42:04 -0800
From: Yongtao You <[email protected]>
Subject: Re: iconv from Extended ASCII to UTF-8
"Erik Max Francis" <[email protected]> wrote in message
news:[email protected]...
> Yongtao wrote:
>
> > I have a program that calls iconv() to do conversion from 8859-1 to
> > UTF-8. Everything works fine when the inputs are standard ASCII chars
> > (0-127). However, it failed with an errno of 88 (EILSEQ) when there
> > are Extended ASCII chars (>128) present. What's interesting is, if I
> > use the /usr/bin/iconv program to do the exact same conversion (with
> > exactly the same Extended ASCII chars), it works.
>
>
> You are converting from Latin-1 to UTF-8, but you say the data is
> actually "extended ASCII." The problem is that "extended ASCII" just
> means "ASCII with unspecified 8-bit characters included," i.e., it means
> "ASCII and some other unknown stuff." That's not Latin-1, so a Latin-1
> conversion utility is almost certain to have problem with arbitrary
> "extended ASCII" data.
>
> The key to doing the proper conversion is to find out precisely what the
> data is. It's not Latin-1, and it's not ASCII, so what is it? Once you
> find out what it really is, you'll be able to convert it properly.
> Conversion algorithms can only do the right thing when they're given
> valid data; you're not giving it valid data.
>
> > The question is, how can I make the iconv() call do the same thing the
> > /usr/bin/iconv program does? Before this, I thought they were doing
> > exactly the same thing.
>
> Presumably the standalone program is running in a more permissive mode,
> where invalid conversions are suppressed rather than ignored. Look for
> a part of the low-level API that allows you to do this.
>
> --
> Erik Max Francis / [email protected] / http://www.alcyone.com/max/
> __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
> / \ Sit loosely in the saddle of life.
> \__/ Robert Louis Stevenson
> Discord / http://www.alcyone.com/pyos/discord/
> Convert dates from Gregorian to Discordian.
Erik,
Thanks for your reply.
The two "Extended ASCII chars" I was talking about are 0xA7 and 0xDA.
According to this page:
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
both are listed as valid ISO8859-1 chars. Should I expect them to be
converted correctly?
BTW, I am using Solaris 8.
Thanks.
Yongtao
..............................................................................
Newsgroups: comp.unix.solaris
Date: 29 Mar 2003 21:38:28 -0800
Organization: Twin Sun Inc, El Segundo, CA, USA
Message-ID: <[email protected]>
References: <[email protected]>
From: Paul Eggert <[email protected]>
Subject: Re: iconv from Extended ASCII to UTF-8
"Yongtao You" <[email protected]> writes:
> The two "Extended ASCII chars" I was talking about are 0xA7 and 0xDA.
> According to this page:
>
> http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
>
> both are listed as valid ISO8859-1 chars. Should I expect them to be
> converted correctly?
Sure, if you specify 8859-1 rather than ASCII.
> BTW, I am using Solaris 8.
Then I suggest that you install Sun patch 113261, if you're messing
with this stuff. It's freely available from
http://sunsolve.sun.com/
The current patch revision is 113261-02.
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.protocols.tcp-ip
Organization: Sun Microsystems Inc. - BDC
Message-ID: <[email protected]>
References: <[email protected]> <[email protected]>
<uWmla.404767$L1.116540@sccrnsc02>
Date: 11 Apr 2003 08:24:37 -0400
From: James Carlson <[email protected]>
Subject: right to left [was Re: Mystic questions about TCP/IP]
"Glen Herrmannsfeldt" <[email protected]> writes:
>
> Are Hebrew numbers written MSB to the left or right?
When I was at Data General, one of the projects I worked on was a
semitic-mode (Arabic and Hebrew) terminal. Right-to-left is really
quite special. Numbers and foreign (i.e., English) text are written
left-to-right, but native text is written right-to-left.
This means that when you're typing on such a terminal, the cursor
starts at the right and moves left as you type. When you start typing
a number, though, the cursor stops moving and the text shifts off to
the left. When you stop typing the number or foreign text, the cursor
jumps left over the text you've typed to start going right-to-left
again.
In addition to that, there's usually a big "mode switch" that allows
the terminal to be used in right-to-left or left-to-right modes and
can be switched on the fly.
And in addition to that, Arabic (at least) has left- and right-
connected forms for each of the 40 basic characters, and the
connectedness of each one depends on what character is to the left
and right of that one. (I.e., most characters have four forms: not
connected, connected left, connected right, and connected both.)
To say that it's hard to implement correctly (imagine what 'insert
character' and 'delete character' do) is putting it mildly.
(This is all from 12+ year old memory now ... so some of it might be
slightly off. Corrections welcome, of course.)
--
James Carlson, Solaris Networking <[email protected]>
Sun Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.terminals
References: <[email protected]>
Message-ID: <[email protected]>
Date: Tue, 19 Apr 2005 16:26:25 -0000
From: Thomas Dickey <[email protected]>
Subject: Re: Putty input characters
Bjoern Wolfgardt <[email protected]> wrote:
> Hi,
> I have a problem with Putty. I have a test tool on our host that
> displays special characters (umlaute, 'ä' ae, 'ü' ue...).
> They are displayed correcly. But if I press the 'ä' key, the character
> is not displayed. The host uses my input as a control key or something
> else.
> So my question is:
> How do I get input and output to work with german keyboard (and
> umlaute)?
See PuTTY's configuration (window/translations). Your session is
probably assuming that input is UTF-8 rather than ISO-8859-1.
(this should be in PuTTY's faq).
--
Thomas E. Dickey
http://invisible-island.net/
ftp://invisible-island.net/
..............................................................................
Newsgroups: comp.terminals
NNTP-Posting-Host: hb-server-02.buhlmann.de [217.7.105.122]
NNTP-Posting-Date: Wed, 20 Apr 2005 07:23:28 +0000 (UTC)
References: <[email protected]>
Message-ID: <[email protected]>
Date: 20 Apr 2005 00:23:27 -0700
From: Bjoern Wolfgardt <[email protected]>
Subject: Re: Putty input characters
Thomas Dickey <[email protected]> wrote in message
news:<[email protected]>...
>
> see PuTTY's configuration (window/translations). Your session is
> probably assuming that input is UTF-8 rather than ISO-8859-1.
>
> (this should be in PuTTY's faq).
Thank you,
It is not in the FAQ (or I didn't find it). So it is not in Putty?
It is a host configuration?
cu
Bjoern
..............................................................................
Newsgroups: comp.terminals
NNTP-Posting-Host: rapun.sel.cam.ac.uk
References: <[email protected]>
Message-ID: <[email protected]>
Organization: University of Cambridge, England
Date: 20 Apr 2005 11:27:27 +0100
From: Owen Dunn <[email protected]>
Subject: Re: Putty input characters
Thomas Dickey <[email protected]> writes:
>
> see PuTTY's configuration (window/translations). Your session is
> probably assuming that input is UTF-8 rather than ISO-8859-1.
>
> (this should be in PuTTY's faq).
Shockingly, we reserve our FAQ for questions which really are
frequently asked :-).
(S)
//////////////////////////////////////////////////////////////////////////////
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Newsgroups: comp.std.internat,comp.protocols.tcp-ip
Path: utkcs2!emory!samsung!cs.utexas.edu!sun-barr!decwrl!mcnc!uvaarpa!murdoch
Message-ID: <[email protected]>
References: <[email protected]> <[email protected]>
Sender: [email protected]
Organization: University of Virginia Lines: 60
Date: 10 Apr 1991 17:27:56 GMT
From: [email protected] (Randall Atkinson)
Subject: Re: universality of Latin-1
John Gilmore originally wrote:
%
% And my windows all use ISO Latin 1. If Torbj|rn would send the
% umlauted letter in that standardized character set, it would look right
% in both the States and in Sweden.
In article <[email protected]>,
Erik M. van der Poel <[email protected]> responded:
>
> Have you ever tried to send yourself a message in Latin-1? Did it
> work? And even if *you* have a reasonable version of sendmail (one
> that doesn't strip the 8th bit), what makes you so certain that
> Torbj|rn's message and anyone else's won't pass through a site that
> *does* strip the 8th bit?
It does work for a fair and ever increasing subset of the Internet.
BITNET doesn't do very well with it. Clearly we need to move towards
8-bit and 16-bit and 32-bit transparent mail-transport mechanisms.
Fortunately there are a number of possible transport mechanisms out
there to choose from, some of which are already 8-bit transparent.
> Also, what's so "standardized" about ISO Latin-1? What makes it more
> standard than, say, Latin-2?
ISO 8859/1 is NOT any "more standard" than ISO 8859/2, however sites
in the US are in fact migrating towards ISO 8859/1 from US ASCII and
most sites in the US are NOT migrating towards ISO 8859/2 (though they
might support it on the side as vendors begin to). The languages that
are most commonly used in the US are in ISO 8859/1 and the languages
supported by ISO 8859/2 are less commonly used (again in the US as a
whole).
Note that ISO Latin-1 is ISO 8859/1 which is the 8-bit character set
used for Western European languages. ISO Latin-2 is ISO 8859/2 which
is the 8-bit character set for Eastern European languages.
Clearly we need to add additional information to the header of mail
messages to indicate which character set to use. I'm not sure of
the current state of the Internet protocols (RFC 822 et. al.) with
respect to this. If there isn't the equivalent of a "Character-set:"
header yet, serious consideration should be given to adding one with
clearly defined values for at least existing ANSI and ISO character
sets.
[ARCHIVER'S NOTE: the Multipurpose Internet Mail Extensions (MIME)
protocol defines character-set-selection headers for SMTP e-mail.
See the Internet standards RFC1521, RFC1523, and RFC1425.]
Character sets that should have a defined string to use with such a
header field include at least:
ASCII
ISO 8859/1
...
ISO 8859/N (where N is the last defined set)
ISO 10646 (once it gets completed)
The Internet is the dominant mail transport network at present, partly
because so many other networks gateway with it. Getting the Internet
to convert to supporting such needs would be a big step in the right
direction. Perhaps someone on the IETF can comment on their current
activities in this area ??
Ran Atkinson
..............................................................................
Newsgroups: comp.std.internat,comp.protocols.tcp-ip
Path: utkcs2!emory!swrinde!cs.utexas.edu!sun-barr!newstop!sun!amdcad!dgcad
!dg-rtp!chutney!eliot
Message-ID: <[email protected]>
References: <[email protected]> <[email protected]>
Organization: Data General Corporation, Research Triangle Park, NC
Date: 12 Apr 1991 12:47:41 GMT
From: [email protected] (Topher Eliot)
Subject: Re: universality of Latin-1
In article <[email protected]>,
[email protected] (Randall Atkinson) writes:
|>
|> In article <[email protected]>,
|> Erik M. van der Poel <[email protected]> responded:
|> >Have you ever tried to send yourself a message in Latin-1? Did it
|> >work? And even if *you* have a reasonable version of sendmail (one
|> >that doesn't strip the 8th bit), what makes you so certain that
|> >Torbj|rn's message and anyone else's won't pass through a site that
|> >*does* strip the 8th bit?
|> It does work for a fair and ever increasing subset of the Internet.
|> BITNET doesn't do very well with it. Clearly we need to move towards
|> 8-bit and 16-bit and 32-bit transparent mail transport mechanisms.
I expected to see someone else post a more authoritative answer, but since
none has been forthcoming, I will venture. The folks who work on such things
have been considering the 8-bit, different-codeset issues, as part of a much
larger picture of including such things as graphics and other binary
information in mail. Since those are harder problems, it means that they
won't have solutions all that quickly. There is a mailing list on this
subject; if you really need it I can probaly dig out a lead on how to get
onto that mailing list.
|> Fortunately there are a number of possible transport mechanisms out
|> there to choose from, some of which are already 8-bit transparent.
Ack! "Fortunately"? There is an ancient curse: "may you live in interesting
times". I think it's modern equivalent is "may you have many standards to
choose from".
--
Topher Eliot Data General DG/UX Internationalization
(919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709
[email protected] {backbone}!mcnc!rti!dg-rtp!eliot
Obviously, I speak for myself, not for DG.
//////////////////////////////////////////////////////////////////////////////
Newsgroups: alt.folklore.computers,bit.listserv.ibm-main
References:
<Pine.LNX.3.95.1010423022239.22834A-100000@schmooze.hunter.cuny.edu>
NNTP-Posting-Host: user-33qtp48.dialup.mindspring.com [199.174.228.136]
Organization: Wheeler&Wheeler
Message-ID: <[email protected]>
Reply-To: Anne & Lynn Wheeler <[email protected]>
Date: Mon, 14 May 2001 17:40:51 GMT
From: Anne & Lynn Wheeler <[email protected]>
Subject: Re: Pre ARPAnet email?
Anne & Lynn Wheeler <[email protected]> writes:
> new STD1 (2800) is out today with new format for some sections
> ... note verbage for STD4, STD10 and a couple others.
also showed up today were a number of "old" rfcs recently converted to
machine readable from hardcopy
rfc3, rfc5, rfc6, rfc21, rfc23, rfc24, rfc25, rfc27, rfc28, rfc29,
rfc30, rfc344, rfc567, rfc593
RFC6 ... discussion about BB&N providing character code
conversion. This isn't an easy problem (in many cases). While
undergraduate in '68 I had put TTY/ASCII support into CP/67 ... which
was incorporated and distributed as part of the standard
release. There were some codes that it was very difficult to provide
symmetric conversion for ... at least in one case, I tried to map
characters in ASCII to valid EBCDIC because I needed some character in
ASCII. On the 2741, "at"-sign and "cent"-sign were on the same key and
CP/67 had a convention that used (lowercase) "at"-sign (in line
editing) for character delete and "cent"-sign for line delete. The TTY
keyboard didn't have cent-sign ... so I mapped (been a number of
years) "left" bracket.
Then in late '68 because of various difficiences in the mainframe 2702
terminal controller, four of us started a project to build the first
mainframe PCM control unit using Interdata3s. Had to build our own
channel attach card that attached the Interdata3 to the mainframe I/O
channel. An emulated line-scanner was built in the Interdata3 that was
targeted at supporting both dynamic line-speed recognition as well as
dynamic terminal-type recognition (as part of the original TTY support
in CP/67, I had expanded the existing dynamic terminal type
recognition to TTY ... however 2702 had a difficiency that while the
line-scanner could be changed for each line ... the hardware
oscillator setting the line speed was hard wired).
random refs:
http://www.garlic.com/~lynn/subtopic.html#subtopic
Network Working Note Steve Crocker, UCLA
RFC-6 10 April 1969
CONVERSATION WITH BOB KAHN
I talked with Bob Kahn at BB&N yesterday. We talked about code conversion
in the IMP's, IMP-HOST communication, and HOST software.
BB&N is prepared to convert 6, 7, 8, or 9 bit character codes into 8-bit
ASCII for transmission and convert again upon assembly at the destination
IMP. BB&N plans a one for one conversion scheme with tables unique to the
HOST. I suggested that places with 6-bit codes may also want case shifting.
Bob said this may result in overflow if too many case shifts are necessary.
I suggested that this is rare and we could probably live with an overflow
indication instead of a guarantee.
With respect to HOST-IMP communication, we now have a five bit link field
and a bit to indicate conversion. Also possible is a 2-bit conversion
indicator, one for converting before sending and one for converting after.
This would allow another handle for checking or controlling the system.
--
Anne & Lynn Wheeler | [email protected] - http://www.garlic.com/~lynn/
//////////////////////////////////////////////////////////////////////////////
Newsgroups: comp.misc
Path: utkcs2!emory!sol.ctr.columbia.edu!spool.mu.edu!agate!sunkist.berkeley.edu
Message-ID: <[email protected]>
Date: 29 May 1991 00:04:49 GMT
References: <[email protected]>
Reply-To: [email protected] (Raymond Chen)
In-Reply-To: [email protected] (John Woods)
From: [email protected] (Raymond Chen)
Subject: Re: Name that character! (definitive list)
Why does everyone feel compelled to post their favorite pronunciations?
In article <[email protected]>, eanv20@castle (John Woods) writes:
>I wonder if there is a definitive list...
Indeed there is. It used to be part of the comp.unix.questions
Frequently Asked Questions file, but it has since moved into the
`Jargon File'. Many thanks to Maarten Litmath for maintaining
the USENET ASCII Pronunciation Guide for many years. (Though the
list below does seem to be missing some of the cleverer names
in Maarten's list. Like `Donald Duck' for `&'.)
<ASCII> [American Standard Code for Information Interchange] /as'kee/
n. Common slang names for ASCII characters are collected here. See
individual entries for <bang>, <close>, <excl>, <open>, <ques>,
<semi>, <shriek>, <splat>, <twiddle>, <what>, <wow>, and <Yu-Shiang
whole fish>. This list derives from revision 2.2 of the USENET
ASCII pronunciation guide. Single characters are listed in ASCII
order, character pairs are sorted in by first member. For each
character, "official" names appear first, then others in order of
popularity (more or less).
!
exclamation point, exclamation, bang, factorial, excl,
ball-bat, pling, smash, shriek, cuss, wow, hey, wham
"
double quote, quote, dirk, literal mark, rabbit ears
#
number sign, sharp, crunch, mesh, hex, hash, flash, grid,
pig-pen, tictactoe, scratchmark, octothorpe, thud
$
dollar sign, currency symbol, buck, cash, string (from
BASIC), escape (from <TOPS-10>), ding, big-money, cache
%
percent sign, percent, mod, double-oh-seven
&
ampersand, amper, and, address (from C), andpersand
'
apostrophe, single quote, quote, prime, tick, irk, pop,
spark
()
open/close parenthesis, left/right parenthesis,
paren/thesis, lparen/rparen, parenthisey, unparenthisey,
open/close round bracket, ears, so/already, wax/wane
*
asterisk, star, splat, wildcard, gear, dingle, mult
+
plus sign, plus, add, cross, intersection
,
comma, tail
-
hyphen, dash, minus sign, worm
.
period, dot, decimal point, radix point, point, full stop,
spot
/
virgule, slash, stroke, slant, diagonal, solidus, over, slat
:
colon
;
semicolon, semi
<>
angle brackets, brokets, left/right angle, less/greater
than, read from/write to, from/into, from/toward, in/out,
comesfrom/ gozinta (all from UNIX), funnel, crunch/zap,
suck/blow
=
equal sign, equals, quadrathorp, gets, half-mesh
?
question mark, query, whatmark, what, wildchar, ques, huh,
hook
@
at sign, at, each, vortex, whorl, whirlpool, cyclone, snail,
ape, cat
V
vee, book
[]
square brackets, left/right bracket, bracket/unbracket,
bra/ket, square/unsquare, U turns
\
reversed virgule, backslash, bash, backslant, backwhack,
backslat, escape (from UNIX), slosh.
^
circumflex, caret, uparrow, hat, chevron, sharkfin, to ("to
the power of"), fang
_
underscore, underline, underbar, under, score, backarrow
`
grave accent, grave, backquote, left quote, open quote,
backprime, unapostrophe, backspark, birk, blugle, back tick,
push
{}
open/close brace, left/right brace, brace/unbrace, curly
bracket, curly/uncurly, leftit/rytit, embrace/bracelet
|
vertical bar, bar, or, or-bar, v-bar, pipe, gozinta, thru,
pipesinta (last four from UNIX)
~
tilde, squiggle, approx, wiggle, twiddle, swung dash, enyay
Some other common usages cause odd overlaps. The ``$'', ``#'', and ``&''
chars, for example, are all pronunced `hex' in different
communities because various assemblers use them as a prefix tag for
hexadecimal constants (in particular, $ in the 6502 world and & on
the Sinclair and some other Z80 machines).
................................................
ARCHIVER'S NOTE
The jest about Donald Duck comes from the name
used for this Disney character in Denmark:
"Anders And".