< Back to IRCAM Forum

File format to keep accents and ' within TEXTFILE

Hello ! :slight_smile:

Am I right that accents and some other characters are lost when using any Unicode encoded text file with TEXTFILE ?
It seems to work with the “Western ISO Latin 1” encoding.

Thanks very much !

O.

Yes Correct !

Best
K

Hello Karim !! :partying_face:

Thank you !!

M @haddad–what’s the reasoning behind its being this way? I could maybe see restricting character sets in file|directory names, but wouldn’t, for example, UTF-8 be a better encoding for file contents? I know LispWorks uses :latin-1 by default, but IMHO that’s not a good reason alone.

-Jonathan

Dear Jonathan,

The main reason is that “historically” OM code was written for MCL back in the days (cf. Macintosh Common Lisp - Wikipedia) and as i remember utf8 was not supported in MCL. But you are right, we should upgrade OM for utf8. However, thos will be a somehow delicate matter, for we look forward to have compatibility with old files/workspaces, etc… and we don ont want to break this. So, when we have some time, i will look into it.

Best
Karim

1 Like

Thanks for the explanation, Karim. I didn’t realize that OM had ever existed in a form not dependent on LispWorks.

You might consider the following idiom, from the LW manual at §26.6.3.5:
For example, the following will cause LispWorks to use UTF-8 if the file begins with valid UTF-8 bytes:

(pushnew :utf-8 system:*specific-valid-file-encodings*)

I think they meant it for situations like this.

-J.

2 Likes

Dear Johnathan,

thank you again for the tip. I will try this. But the problem of utf, is that latin encoding is all over the place in the code. I have to change all these in order to test it quietly and make sure it works and most particularly compatibility issue.

Will keep you informed.
Best
K

1 Like

You’re welcome. This issue isn’t a big deal to me personally, but it seems like it’d be of long-term benefit for the program.
-J.

utf-8 shouldnt be a big problem. I’ve pushed a utf-8 branch of OM at the repo. Things seem ok with relatively normal patch-files and workspaces, but there might be (probably is) issues with stranger old encodings, also across OS’es.

Karim will need to test everything and say ‘go’ before this can be put out in the wild.

2 Likes

Thanks a lot Anders.

Testing right now!
Will keep you informed.

BEst
K

Hi,

Just to say thanks to Johnathan and Anders, OM now supports utf8. Coming in the new 7.2 version to be released soon in March.

Best
K

1 Like

Perhaps worth a note: OM defaults to utf-8 now, meaning if you open any latin-1 encoded file in om and save it, it will end up as utf-8.

As well, utf8 should support most practical uses. But with this change it would be quite easy to add support for other encodings for very special needs

Thank Anders for these notes.

I will add one:
WARNING if you name patches in utf8 (using non latin-1) you will not be able to load correctly your workspace with older version of OM.

Best
K