[Break change] Support 64bit length, add various types and typed containers#311
[Break change] Support 64bit length, add various types and typed containers#311cmpute wants to merge 3 commits intomsgpack:masterfrom
Conversation
- Add variable lengh fixext - Add complex numbers - Add bin 64, ext 64 - Add bigint, bigfloat - Add UUID - Add typed containers
|
A possible guideline for parsers to handle the backward compatibility: Serialization
Deserialization
|
|
This is really nice. UUIDs and complex types would greatly help for my purposes. |
@mincequi I already included the spec for float16 in the bigfloat type, since ideally bigfloat can represent a float number with any size. For more efficient way to store float16, a modification to array types could be useful |
|
I think everything can be described as extensions. Other things (such as deprecation of |
|
@Saiv46 no, extension types have the length defined in the same way |
This is a proposal for a lot of modifications based on current specs. Part 1 will break backward compatibility (specifically depricate old
fixextfields), but it's well worth it. It will drastically improve space efficiency and time efficiency in certain applications. The main modification includesTop-level types (Part 1)
Add variable length
fixextfixext 1,2,4,8,16is unified to a more compact format as proposed in #310 . Benefit for this includes:For example, to store a 3-byte ext data, previous we need
0xc7+0x03+ type byte + 4-byte payload = 7 bytes. With the proposed format it only needs 5 bytes (28.57% less)Add complex numbers
The most common 64-bit and 128-bit complex type are added.
Complex number is the only primitive type missing in msgpack format. It's natively supported by most general-purpose and scientific programming languages. Adding complex numbers as top level types will help serialize scientific data with typed containers proposed below.
Add bin 64, ext 64
This is a feature requested by a lot of people (#214 #190 #268). 64-bit indexing support is added to
binandext, which will fit most of the demands.In modern computers, RAM size is usually larger than 4GB (can be up to TB in data centers), so loading all data into memory is very common. Chunking the data is inconvenient and will lead to performance loss if large data is stored. Moreover, there's currently no specification about how to chunk the data in msgpack. With the help of 4 additional type codes freed by variable length
fixext, this can be easily added to the specification.In my opinion, msgpack is very simple and clean, it can be used to store large data, satisfying more demands than network communication.
More ext types (Part 2)
Add bigint, bigfloat
This proposal is modified from #249, fixing #206, #292. Only interger and floating point number is added. Large decimal and fraction types are rarely demanded in my opinion.
int 128,float 16andfloat 128are also proposed with this format, which only requires 2 extra byte thanks to the variable lengthfixext.Add UUID
UUID is widely used nowadays. Officially support UUID by assigning an extension type is not a bad idea in my opinion. This will fix #222 #239.
With UUID, Bigint and Bigfloat supported, there're 4 additional ext types left within
fixextcapacity, which can be used in future.Add typed containers
Motivated by #267 and #268, I added support for typed containers, specifically typed array, typed map and typed n-d array. The benefit for typed containers is for reducing overhead of the additional type bytes and zeroing copies. "structured array" as proposed in #267 is not added since it's a lot more complicated for parsers to implement than the formats proposed in this PR.
Note that the size of the containers is not explicity stored in the proposed format, it should be calculated by
(payload size - overhead size) / (element size)This is a big proposal, comments, suggestions and modifications are welcome!