Converting Byte Array to String Using NumPy Dtype Only. Python

Question

I'm working on a task where I need to convert a byte array obtained from a hexadecimal string into a string representation, utilizing only the specified data type (dtype) in NumPy. Here's what I've tried:

import numpy as np

# Define the data type as a Unicode string of length 1
dt = np.dtype('<S1')  # Unicode string of length 1

# Sample data
res = '52'

# Convert the hexadecimal string to a bytearray
buf = bytearray.fromhex(res)

# Create a NumPy array from the list of characters
arr_resp = np.frombuffer(buf, dtype=dt)

print(arr_resp_str)

the consol:

[b'R']

and I want to store it as:

['R']

However, instead of obtaining the string representation of the byte array, the output displays the characters as byte class objects. How can I adjust this code to achieve the string representation using only the NumPy dtype? I don't want to just convert, (like DECODE etc.) but store back the string inside a NUMPY object Your insights and suggestions would be invaluable. Thank you.

I don't want to just convert, (like DECODE etc.) I want store back the string inside a NUMPY object — Ino, Commented Feb 25, 2024 at 14:34
np.dtype('<S1') is not unicode. It's one byte. 'U1' is unicode, 4 bytes. (technically unicode is 1-4 bytes, but numpy has to have a consistent itemsize!. np.array([b'R']).astype('U1')` should do the desired conversion (but probably using string methods under the covers). — hpaulj, Commented Feb 25, 2024 at 15:42
b'R' is how python3 displays a byte string. That used to be the standard in python2. Now the default is unicode, which is displayed as 'R' (or u'R'). — hpaulj, Commented Feb 25, 2024 at 21:17
I don't understand this: store back the string inside a NUMPY object. The buffer contains one byte, which can be interpreted as 'uint8' or 'S1'. It'can't be a 'U1' character without decode. — hpaulj, Commented Feb 25, 2024 at 21:37

hpaulj · Accepted Answer · 2024-02-25 23:16:21Z

In [63]: res = '52'
    ...: 
    ...: # Convert the hexadecimal string to a bytearray
    ...: buf = bytearray.fromhex(res)

So you have created a bytearray with one ASCII character:

In [64]: buf
Out[64]: bytearray(b'R')

That can be read with 'S1':

In [65]: np.frombuffer(buf, dtype='S1')
Out[65]: array([b'R'], dtype='|S1')

but not with 'U1' (unicode):

In [66]: np.frombuffer(buf, dtype='U1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[66], line 1
----> 1 np.frombuffer(buf, dtype='U1')

ValueError: buffer size must be a multiple of element size

astype can convert the 'S1' to 'U1':

In [67]: np.frombuffer(buf, dtype='S1').astype('U1')
Out[67]: array(['R'], dtype='<U1')

In [68]: _.itemsize
Out[68]: 4

Under the covers I'm pretty sure numpy is using the python string decode. numpy doesn't have much of its own 'raw' string code; it uses python's.

In [70]: np.frombuffer(buf, dtype='S1').item()
Out[70]: b'R'

In [71]: np.frombuffer(buf, dtype='S1').item().decode()
Out[71]: 'R'

edit

Let's look at what kinds of bytearrays b'R' and 'R' produce:

In [166]: np.array([b'R']).tobytes()
Out[166]: b'R'

In [167]: bytearray(_)
Out[167]: bytearray(b'R')

In [168]: np.array(['R']).tobytes()
Out[168]: b'R\x00\x00\x00'

In [169]: bytearray(_)
Out[169]: bytearray(b'R\x00\x00\x00')

And applying frombuffer to these bytearrays:

In [170]: np.frombuffer(_,'U1')
Out[170]: array(['R'], dtype='<U1')

In [171]: np.frombuffer(__,'S1')
Out[171]: array([b'R', b'', b'', b''], dtype='|S1')

I suppose you could take the single byte array, add the 3 padding blanks, and 'read' that as 'U1'. That would, in sense, do the decode without using the python method .

Collectives™ on Stack Overflow

Converting Byte Array to String Using NumPy Dtype Only. Python

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related