Serialization and Unserialization What’s this “serialization” thing all about? It lets you take an object or group of objects, put them on a disk or send them through a wire or wireless transport mechanism, then later, perhaps on another computer, reverse the process: resurrect the original object(s). Apr 14, 2018 The problem is that with various data structures (which often contain void. data so you don't know whether you need to care about byte ordering) the code becomes really bloated with serialization code that's very specific to each data structure and can't be reused at all. Sections in the Federal FD&C 3 duct tion) al ew Act. Data Carrier – 2D bar code. (Serialization) 16. Standardized numerical identifier. After products are serialized.
Active1 year, 5 months ago
I'm writing some code to serialize some data to send it over the network. Currently, I use this primitive procedure:
- create a
void*
buffer - apply any byte ordering operations such as the
hton
family on the data I want to send over the network - use
memcpy
to copy the memory into the buffer - send the memory over the network
The problem is that with various data structures (which often contain void* data so you don't know whether you need to care about byte ordering) the code becomes really bloated with serialization code that's very specific to each data structure and can't be reused at all.
What are some good serialization techniques for C that make this easier / less ugly?
-
Note: I'm bound to a specific protocol so I cannot freely choose how to serialize my data.
ryystryyst5,03615 gold badges58 silver badges95 bronze badges
4 Answers
For each data structure, have a serialize_X function (where X is the struct name) which takes a pointer to an X and a pointer to an opaque buffer structure and calls the appropriate serializing functions. You should supply some primitives such as serialize_int which write to the buffer and update the output index.The primitives will have to call something like reserve_space(N) where N is the number of bytes that are required before writing any data. reserve_space() will realloc the void* buffer to make it at least as big as it's current size plus N bytes.To make this possible, the buffer structure will need to contain a pointer to the actual data, the index to write the next byte to (output index) and the size that is allocated for the data.With this system, all of your serialize_X functions should be pretty straightforward, for example:
And the framework code will be something like:
From this, it should be pretty simple to implement all of the serialize_() functions you need.
EDIT:For example:
When installed, it will add a context menu handler to the Windows shell in order to provide quick access to the program. The main program executable is ssyncpro5.exe. Smartsync pro windows 10. The software installer includes 10 files and is usually about 18.33 MB (19,216,751 bytes). In comparison to the total number of users, most PCs are running the OS Windows 10 as well as Windows 7 (SP1).
EDIT:Also note that my code has some potential bugs. The size of the buffer array is stored in a size_t but the index is an int (I'm not sure if size_t is considered a reasonable type for an index). Also, there is no provision for error handling and no function to free the Buffer after you're done so you'll have to do this yourself. I was just giving a demonstration of the basic architecture that I would use.
jstanleyjstanley
I suggest using a library.
As I was not happy with the existing ones, I created the Binn library to make our lives easier.
Here is an example of using it:
Bernardo RamosBernardo Ramos
I would say definitely don't try to implement serialization yourself. It's been done a zillion times and you should use an existing solution. e.g. protobufs: https://github.com/protobuf-c/protobuf-c
It also has the advantage of being compatible with many other programming languages.
Assaf LavieAssaf Lavie46.9k31 gold badges130 silver badges188 bronze badges
It would help if we knew what the protocol constraints are, but in general your options are really pretty limited. If the data are such that you can make a union of a byte array sizeof(struct) for each struct it might simplify things, but from your description it sounds like you have a more essential problem: if you're transferring pointers (you mention void * data) then those points are very unlikely to be valid on the receiving machine. Why would the data happen to appear at the same place in memory?
Charlie MartinCharlie Martin94.8k21 gold badges171 silver badges245 bronze badges
Not the answer you're looking for? Browse other questions tagged cserialization or ask your own question.
(Redirected from Comparison of data serialization formats)
C++ Data Types
This is a comparison of>N/ANoApache Avro™ 1.8.1 SpecificationYesNoN/AYes (built-in)N/AN/AApache ParquetApache Software FoundationN/ANoYesNoNoN/AJava, PythonNoASN.1ISO, IEC, ITU-TN/AYesISO/IEC 8824; X.680 series of ITU-T RecommendationsYes
(BER, DER, PER, OER, or custom via ECN)Yes
(XER, JER, GSER, or custom via ECN)PartialfYes (built-in)N/AYes (OER)BencodeBram Cohen (creator)
BitTorrent, Inc. (maintainer)N/ADe facto standard via BitTorrent Enhancement Proposal (BEP)Part of BitTorrent protocol specificationPartially
(numbers and delimiters are ASCII)NoNoNoNoN/ABinnBernardo RamosN/ANoBinn SpecificationYesNoNoNoNoYesBSONMongoDBJSONNoBSON SpecificationYesNoNoNoNoN/ACBORCarsten Bormann, P. HoffmanJSON (loosely)YesRFC 7049YesNoYes
through taggingYes
(CDDL)NoYesComma-separated values (CSV)RFC author:
Yakov ShafranovichN/APartial
(myriad informal variants used)RFC 4180
(among others)NoYesNoNoNoNoCommon Data Representation (CDR)Object Management GroupN/AYesGeneral Inter-ORB ProtocolYesNoYesYesADA, C, C++, Java, Cobol, Lisp, Python, Ruby, SmalltalkN/AD-Bus Message Protocolfreedesktop.orgN/AYesD-Bus SpecificationYesNoNoPartial
(Signature strings)Yes
(see D-Bus)N/AEfficient XML Interchange (EXI)W3CXML, Efficient XMLYesEfficient XML Interchange (EXI) Format 1.0YesYes
(XML)Yes
(XPointer, XPath)Yes
(XML Schema)Yes
(DOM, SAX, StAX, XQuery, XPath)N/AFlatBuffersGoogleN/ANoYesYes
(Apache Arrow)Partial
(internal to the buffer)Yes [2]C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScriptYesFast InfosetISO, IEC, ITU-TXMLYesITU-T X.891 and ISO/IEC 24824-1:2007YesNoYes
(XPointer, XPath)Yes
(XML schema)Yes
(DOM, SAX, XQuery, XPath)N/AFHIRHealth_Level_7REST basicsYesFast Healthcare Interoperability ResourcesYesYesYesYesHapi for FHIR[1]JSON, XML, TurtleNoIonAmazonJSONNoThe Amazon Ion SpecificationYesYesNoNoNoN/AJava serializationOracle CorporationN/AYesJava Object SerializationYesNoYesNoYesN/AJSONDouglas CrockfordJavaScript syntaxYesSTD 90/RFC 8259
(ancillary:
RFC 6901,
RFC 6902), ECMA-404, ISO/IEC 21778:2017No, but see BSON, Smile, UBJSONYesYes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select()), JSON-LDPartial
(JSON Schema Proposal, ASN.1 with JER, Kwalify, Rx, Itemscript Schema), JSON-LDPartial
(Clarinet, JSONQuery, JSONPath), JSON-LDNoMessagePackSadayuki FuruhashiJSON (loosely)NoMessagePack format specificationYesNoNoNoNoYesNetstringsDan BernsteinN/ANonetstrings.txtYesYesNoNoNoYesOGDLRolf Veen?NoSpecificationYes
(Binary Specification)YesYes
(Path Specification)Yes
(Schema WD)N/AOPC-UA BinaryOPC FoundationN/ANoopcfoundation.orgYesNoYesNoNoN/AOpenDDLEric LengyelC, PHPNoOpenDDL.orgNoYesYesNoYes
(OpenDDL Library)N/APickle (Python)Guido van RossumPythonDe facto standard via Python Enhancement Proposals (PEPs)[3] PEP 3154 -- Pickle protocol version 4YesNoNoNoYes
([4])NoProperty listNeXT (creator)
Apple (maintainer)?PartialPublic DTD for XML formatYesaYesbNo?Cocoa, CoreFoundation, OpenStep, GnuStepNoProtocol Buffers (protobuf)GoogleN/ANoDeveloper Guide: EncodingYesPartialdNoYes (built-in)C++, C#, Java, Python, Javascript, GoNoS-expressionsJohn McCarthy (original)
Ron Rivest (internet draft)Lisp, NetstringsPartial
(largely de facto)Yes
('Canonical representation')Yes
('Advanced transport representation')NoNoN/ASmileTatu SalorantaJSONNoSmile Format SpecificationYesNoNoPartial
(JSON Schema Proposal, other JSON schemas/IDLs)Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)N/ASOAPW3CXMLYesW3C Recommendations:
SOAP/1.1
SOAP/1.2Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, MTOM, XSD base64 data)YesYes
(built-in id/ref, XPointer, XPath)Yes
(WSDL, XML schema)Yes
(DOM, SAX, XQuery, XPath)N/AStructured Data eXchange FormatsMax WildgrubeN/AYesRFC 3072YesNoNoNoN/AThriftFacebook (creator)
Apache (maintainer)N/ANoOriginal whitepaperYesPartialcNoYes (built-in)N/AUBJSONThe Buzz Media, LLCJSON, BSONNo[5]YesNoNoNoNoN/AeXternal Data Representation (XDR)Sun Microsystems (creator)
IETF (maintainer)N/AYesSTD 67/RFC 4506YesNoYesYesYesN/AXMLW3CSGMLYesW3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, XSD base64 data)YesYes
(XPointer, XPath)Yes
(XML schema, RELAX NG)Yes
(DOM, SAX, XQuery, XPath)N/AXML-RPCDave Winer[2]XMLNoXML-RPC SpecificationNoYesNoNoNoN/AYAMLClark Evans,
Ingy döt Net,
and Oren Ben-KikiC, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[3]NoVersion 1.2NoYesYesPartial
(Kwalify, Rx, built-in language type-defs)NoN/ANameCreator-maintainerBased onStandardized?SpecificationBinary?Human-readable?Supports references?eSchema-IDL?Standard APIsSupports Zero-copy operations
- a. ^ The current default format is binary.
- b. ^ The 'classic' format is plain text, and an XML format is also supported.
- c. ^ Theoretically possible due to abstraction, but no implementation is included.
- d. ^ The primary format is binary, but a text format is available.[4]
- e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ('absolute reference') for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an 'absolute reference' to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
- g. ^ VelocyPack offers a value type to store pointers to other VPack items. It is allowed if the VPack data resides in memory, but not if stored on disk or sent over a network.
- h. ^ The primary format is binary, but a text format is available.[5][6]
- i. ^ The primary format is binary, but text and json formats are available.[7]
Syntax comparison of human-readable formats[edit]
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|---|
ASN.1 (XML Encoding Rules) | <foo /> | <foo>true</foo> | <foo>false</foo> | <foo>685230</foo> | <foo>6.8523015e+5</foo> | <foo>A to Z</foo> | An object (the key is a field name): A data mapping (the key is a data value): | |
CSVb | null a(or an empty element in the row)a | 1 atrue a | 0 afalse a | 685230 -685230 a | 6.8523015e+5 a | A to Z 'We said, 'no'.' | true,-42.1e7,'A to Z' | |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
Ion |
| true | false | 685230 -685230 0xA74AE 0b111010010101110 | 6.8523015e5 | 'A to Z' '' | ||
Netstringsc | 0:, a4:null, a | 1:1, a4:true, a | 1:0, a5:false, a | 6:685230, a | 9:6.8523e+5, a | 6:A to Z, | 29:4:true,0:,7:-42.1e7,6:A to Z, | 41:9:2:42,1:1,25:6:A to Z,12:1:1,1:2,1:3, a |
JSON | null | true | false | 685230 -685230 | 6.8523015e+5 | 'A to Z' | ||
OGDL[verification needed] | null a | true a | false a | 685230 a | 6.8523015e+5 a | 'A to Z' 'A to Z' NoSpaces |
| |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
OpenDDL | ref {null} | bool {true} | bool {false} | int32 {685230} int32 {0x74AE} int32 {0b111010010101110} | float {6.8523015e+5} | string {'A to Z'} | Homogeneous array: Heterogeneous array: | |
Pickle (Python) | N. | I01n. | I00n. | I685230n. | F685230.15n. | S'A to Z'n. | (lI01na(laF-421000000.0naS'A to Z'na. | (dI42nI01nsS'A to Z'n(lI1naI2naI3nas. |
Property list (plain text format)[8] | N/A | <*BY> | <*BN> | <*I685230> | <*R6.8523015e+5> | 'A to Z' | ( <*BY>, <*R-42.1e7>, 'A to Z' ) | |
Property list (XML format)[9][10] | N/A | <true /> | <false /> | <integer>685230</integer> | <real>6.8523015e+5</real> | <string>A to Z</string> | ||
Protocol Buffers | N/A | true | false | 685230 -685230 | 20.0855369 | 'A to Z' | ||
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
S-expressions | NIL nil | T #t ftrue | NIL #f ffalse | 685230 | 6.8523015e+5 | abc 'abc' #616263# 3:abc {MzphYmM=} YWJj | (T NIL -42.1e7 'A to Z') | ((42 T) ('A to Z' (1 2 3))) |
YAML | ~ null Null NULL [11] | y Y yes Yes YES on On ON true True TRUE [12] | n N no No NO off Off OFF false False FALSE [12] | 685230 +685_230 -685230 02472256 0x_0A_74_AE 0b1010_0111_0100_1010_1110 190:20:30 [13] | 6.8523015e+5 685.230_15e+03 685_230.15 190:20:30.15 .inf -.inf .Inf .INF .NaN .nan .NAN [14] | A to Z 'A to Z' 'A to Z' | [y, ~, -42.1e7, 'A to Z'] | {'John':3.14, 'Jane':2.718} |
XMLe and SOAP | <null /> a | true | false | 685230 | 6.8523015e+5 | A to Z | ||
XML-RPC | <value><boolean>1</boolean></value> | <value><boolean>0</boolean></value> | <value><int>685230</int></value> | <value><double>6.8523015e+5</double></value> | <value><string>A to Z</string></value> |
- a. ^ Omitted XML elements are commonly decoded by XML data binding tools as NULLs. Shown here is another possible encoding; XML schema does not define an encoding for this datatype.
- b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- d. ^ PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
- e. ^XML data bindings and SOAP serialization tools provide type-safe XML serialization of programming data structures into XML. Shown are XML values that can be placed in XML elements and attributes.
- f. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats[edit]
Any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a 'Schema Informed' (as opposed to schema-required, or schema-less) binary compression standard for XML.
See also[edit]
References[edit]
- ^'HAPI FHIR - The Open Source FHIR API for Java'. hapifhir.io.
- ^'A Brief History of SOAP'. www.xml.com.
- ^Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). 'YAML Ain't Markup Language (YAML) Version 1.2'. The Official YAML Web Site. Retrieved 2012-02-10.
- ^'text_format.h - Protocol Buffers'. Google Developers.
- ^'Cap'n Proto serialization/RPC system: core tools and C++ library - capnproto/capnproto'. 2 April 2019 – via GitHub.
- ^'Cap'n Proto: The capnp Tool'. capnproto.org.
- ^'Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby: chronoxor/FastBinaryEncoding'. 2 April 2019 – via GitHub.
- ^'NSPropertyListSerialization class documentation'. www.gnustep.org.
- ^'Documentation Archive'. developer.apple.com.
- ^'Documentation Archive'. developer.apple.com.
- ^Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). 'Null Language-Independent Type for YAML Version 1.1'. YAML.org. Retrieved 2009-09-12.
- ^ abOren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). 'Boolean Language-Independent Type for YAML Version 1.1'. YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). 'Integer Language-Independent Type for YAML Version 1.1'. YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). 'Floating-Point Language-Independent Type for YAML Version 1.1'. YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^'MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack'. 2 April 2019 – via GitHub.
City Data
External links[edit]
Java Serialization Data
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Comparison_of_data-serialization_formats&oldid=916328912'