Default encoding is UTF-8? #64

InvncibiltyCloak · 2024-01-17T01:46:01Z

First off, thanks for the great Dewesoft reader library.
I was recently using it for my datafiles which are DXD and are created on a Windows x64, en-US machine.

The units had some unicode characters for degree symbol and ohms. When I imported it with this library it had the classic Å symbol which is the give away of reading UTF-8 binary data but assuming it should be decoded according to Windows codepage (looks like you have ISO-8859-1 chosen).

A quick peek into the python code and I saw this is extremely easy to fix in this library - just call dwdatareader.encoding = 'utf-8' and it gives the correctly decoded strings.

I just wanted to file an issue to bring up the fact that it appears that DewesoftX is encoding strings in UTF-8 and perhaps this library should change the default encoding to match?

Unfortunately I am only sample size of one and have not tested other locales or versions of Dewesoft, so I am not sure if this default encoding applies everywhere. Thanks for your time!

The text was updated successfully, but these errors were encountered:

costerwi · 2024-01-18T03:02:44Z

Thanks for your comments! I'm glad you found it easy to override the encoding.

I cannot find the encoding documented anywhere. The default was set to ISO-8859-1 a long time ago, probably due to an observation like yours. It may have evolved since then. The fact that your Windows machine seems to be recording in UTF-8 seems to be good reason to change the assumed default to UTF-8.

fleimgruber · 2024-11-03T13:29:59Z

Thanks @InvncibiltyCloak for bringing this up. Changing the default encoding to UTF-8 seems reasonable. One consideration though would be to give users the option to explicitly set encodings to maintain backwards compatibility with other encodings, e.g. ISO-8859-1, in older files and with older DEWE stacks?

costerwi · 2024-11-03T14:37:39Z

I never had a good example to test the encoding so it is intentionally very easy for the user to specify:

import dewesoft as dw
dw.encoding='utf-8'

Unfortunately, the Dewesoft library sometimes appends junk characters to the end of strings which cause utf-8 decoding errors in python and fail the tests. If we change the default to utf-8 then we need to either ask Dewesoft fix their library or have python ignore these decoding errors.

fleimgruber · 2024-11-03T14:54:40Z

Ah I should have been more specific. I saw this global option, but wondered if all of the 10 or so usages of it should all use the same encoding, e.g. opening the file in

dwdatareader/dwdatareader/__init__.py

Line 388 in e579a23

    
           stat = DLL.DWOpenDataFile(self.name.encode(encoding=encoding), ctypes.byref(self.info))

vs decoding text values e.g. in

dwdatareader/dwdatareader/__init__.py

Line 88 in e579a23

return self._unit.decode(encoding=encoding)

But it was only guessing on my part without any evidence of different encodings actually occurring.

Unfortunately, the Dewesoft library sometimes appends junk characters to the end of strings which cause utf-8 decoding errors in python and fail the tests. If we change the default to utf-8 then we need to either ask Dewesoft fix their library or have python ignore these decoding errors.

That sounds annoying. I would guess that the junk characters are a result of the C lib interpreting parts of the memory as strings when it should not, i.e. string length mismatch at that level?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default encoding is UTF-8? #64

Default encoding is UTF-8? #64

InvncibiltyCloak commented Jan 17, 2024

costerwi commented Jan 18, 2024

fleimgruber commented Nov 3, 2024

costerwi commented Nov 3, 2024

fleimgruber commented Nov 3, 2024

Default encoding is UTF-8? #64

Default encoding is UTF-8? #64

Comments

InvncibiltyCloak commented Jan 17, 2024

costerwi commented Jan 18, 2024

fleimgruber commented Nov 3, 2024

costerwi commented Nov 3, 2024

fleimgruber commented Nov 3, 2024