Skip to main content

Questions tagged [character-encoding]

16 votes
4 answers
6k views

Why should C++ uint8_t data not be printable?

On this github C++ related page the writer said Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print. Why ...
Russell McMahon's user avatar
-1 votes
1 answer
461 views

Should a Java project use UTF-16? [closed]

Java, by default, uses UTF-16 to represent characters in the String data type. I inherited a JavaFX project which currently has some Strings in UTF-8 and others in UTF-16. This is causing bugs (in pop-...
chilliefiber's user avatar
0 votes
1 answer
364 views

Why does Windows need CR LF to advance to next line, but Python does not?

Windows: Uses CR (\r) in combination with LF (\n) for line endings, facilitating compatibility with legacy systems. Unix-like systems (including Linux and macOS): Employs LF (\n) for line endings, ...
DevelBase2's user avatar
1 vote
1 answer
647 views

Does the SHA256 hashing algorithm change based on the content encoding?

I am starting to look into how to implement SHA256 in JavaScript, and found this for example. It requires UTF-8 encoding it sounds like. Another one I saw required/supported only ASCII encoding and ...
Lance Pollard's user avatar
1 vote
1 answer
75 views

Layout Behavior of Characters (question about unicode standard)

I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...
lonious's user avatar
  • 121
2 votes
2 answers
552 views

Compressing EBCDIC file vs UTF8

Today I went across a weird case for which I have no explanation, so here I am. I have two files with identical content, but one is encoded in UTF-8 and the other one is in IBM EBCDIC. Both of them ...
rodripf's user avatar
  • 137
5 votes
1 answer
382 views

UTF-8 questions

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
codepersonnel49's user avatar
1 vote
2 answers
87 views

In python, what or who is character encoding information for?

If you go to www.htmlbasictutor.ca/character-encoding.htm you will find the following description of character encoding. Character encoding tells the browser and validator what set of characters to ...
progner's user avatar
  • 523
2 votes
2 answers
146 views

What's the difference between the range of characters you can use in a script and a script's encoding?

The two concepts seem equal to me, but I'm not really sure I understand encoding well enough to confirm that this is the case.
progner's user avatar
  • 523
1 vote
1 answer
326 views

Create and implement new encoding

I'm working on a project with huge files that contain only the set {[0-9],.}. Encoding in UTF-8 or ASCII make huge files. I wonder if I could find a way to encode in only 4 bits (make those file 16 ...
PyThagoras's user avatar
2 votes
1 answer
412 views

Differentiating Between ASCII and Unicode in File Spec

I am developing against a file spec that lists the data type for certain fields as CHAR(<length>) The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...
mathewb's user avatar
  • 137
0 votes
2 answers
2k views

Why Unicode Encoding/Decoding is Necessary in JavaScript

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
Lance Pollard's user avatar
0 votes
1 answer
1k views

Java takes 2 bytes to represent character?

In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte can represent 2^8 = 256 Characters. My question is when ...
user3198603's user avatar
  • 1,886
8 votes
1 answer
4k views

Is the BOM optional for UTF-16 and UTF-32?

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. ...
user9002947's user avatar
0 votes
1 answer
280 views

Barcode that support ~3500 chars [closed]

I can't figure out a barcode that would support ~3500 chars. The barcode should contain 40 strings with caret return, each 76 chars long. Each string will look like this: ...
SovereignSun's user avatar

15 30 50 per page