Questions tagged [character-encoding]
The character-encoding tag has no summary.
60 questions
16
votes
4
answers
6k
views
Why should C++ uint8_t data not be printable?
On this github C++ related page the writer said
Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print.
Why ...
-1
votes
1
answer
461
views
Should a Java project use UTF-16? [closed]
Java, by default, uses UTF-16 to represent characters in the String data type.
I inherited a JavaFX project which currently has some Strings in UTF-8 and others in UTF-16. This is causing bugs (in pop-...
0
votes
1
answer
364
views
Why does Windows need CR LF to advance to next line, but Python does not?
Windows:
Uses CR (\r) in combination with LF (\n) for line endings, facilitating compatibility with legacy systems.
Unix-like systems (including Linux and macOS):
Employs LF (\n) for line endings, ...
1
vote
1
answer
647
views
Does the SHA256 hashing algorithm change based on the content encoding?
I am starting to look into how to implement SHA256 in JavaScript, and found this for example. It requires UTF-8 encoding it sounds like. Another one I saw required/supported only ASCII encoding and ...
1
vote
1
answer
75
views
Layout Behavior of Characters (question about unicode standard)
I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...
2
votes
2
answers
552
views
Compressing EBCDIC file vs UTF8
Today I went across a weird case for which I have no explanation, so here I am.
I have two files with identical content, but one is encoded in UTF-8 and the other one is in IBM EBCDIC. Both of them ...
5
votes
1
answer
382
views
UTF-8 questions
When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
1
vote
2
answers
87
views
In python, what or who is character encoding information for?
If you go to www.htmlbasictutor.ca/character-encoding.htm you will find the following description of character encoding.
Character encoding tells the browser and validator what set of characters to ...
2
votes
2
answers
146
views
What's the difference between the range of characters you can use in a script and a script's encoding?
The two concepts seem equal to me, but I'm not really sure I understand encoding well enough to confirm that this is the case.
1
vote
1
answer
326
views
Create and implement new encoding
I'm working on a project with huge files that contain only the set {[0-9],.}.
Encoding in UTF-8 or ASCII make huge files.
I wonder if I could find a way to encode in only 4 bits (make those file 16 ...
2
votes
1
answer
412
views
Differentiating Between ASCII and Unicode in File Spec
I am developing against a file spec that lists the data type for certain fields as
CHAR(<length>)
The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...
0
votes
2
answers
2k
views
Why Unicode Encoding/Decoding is Necessary in JavaScript
I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
0
votes
1
answer
1k
views
Java takes 2 bytes to represent character?
In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte
can represent 2^8 = 256 Characters.
My question is when ...
8
votes
1
answer
4k
views
Is the BOM optional for UTF-16 and UTF-32?
I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32.
But then I have read the following (in this article):
Let's look just at the ones that Notepad supports.
...
0
votes
1
answer
280
views
Barcode that support ~3500 chars [closed]
I can't figure out a barcode that would support ~3500 chars.
The barcode should contain 40 strings with caret return, each 76 chars long. Each string will look like this:
...