Is C# Superior to Python in Serializing Data?

Why Python Can’t Deserialize Data Classes and How to Tackle It

Mordechai Alter
8 min readFeb 27, 2022

When Rachel Chocron asked me “How can I save a Python data class into a file, and load it back later?”, since I had some experience with it in another language, I just said, “Easy, don’t you worry about it”.
After investing a week into the topic, I realized it’s not that simple.

In this post, I’ll show you how it’s done the right way.

To save a class into a file, we’ll first convert the class into a JSON string, which then will be easy to save into a file. Same goes for the way back.

There are mediator options other than JSON, such as YAML, binary, etc. but I find JSON more friendly.

An example of Serialization

Let’s start with a simple example of JSON serialization and deserialization.

I’ll take C# as a reference where serialization behaves nicely

Assume you are the captain of a spaceship, that has lots of rooms for the crew and travelers, and just for fun, let’s say that your spaceship also has a dictionary (you need to translate Alien somehow 😉)

public class Room
{
public string Name { get; set; }
public int Floor { get; set; }
}
public class Spaceship
{
public List<Room> Rooms { get; set; }
public Dictionary<string, string> Translations { get; set; }
}

And you have the following Spaceship:

var mySpaceship = new Spaceship
{
Rooms = new()
{
new() { Name = "Big", Floor = 1 },
new() { Name = "Small", Floor = 1 },
new() { Name = "High", Floor = 42 },
},
Translations = new()
{
["Hello"] = "👋",
["World"] = "🌎",
},
};

In C# you can use the built-in library System.Text.Json to serialize the Spaceship into JSON with just a single line of code. No need to add anything to the classes, it takes the class data as is.

var spaceString = JsonSerializer.Serialize(mySpaceship);

Which will give the following JSON result:

{
"Rooms":[
{ "Name":"Big", "Floor":1 },
{ "Name":"Small", "Floor":1 },
{ "Name":"High", "Floor":42 }
],
"Translations":{
"Hello":"👋",
"World":"🌎"
}
}

Now, with a JSON string in your hands, you can save it to a file. Once you need your Spaceship back, you’ll read the file and deserialize it to an object.

var revivedShip = JsonSerializer.Deserialize<Spaceship>(spaceString);

Now, mySpaceship is the same as revivedShip.

Simple! Isn’t it?

Does it work in Python?

Let’s try to do the same with Python, using the built-in json library.

First, let’s define our classes:

class Room:
def __init__(self, name, floor):
self.name = name
self.floor = floor
class Spaceship:
def __init__(self, rooms, translations):
self.rooms = rooms
self.translations = translations

Then initialize with the data:

my_spaceship = Spaceship(
rooms=[
Room(name="Big", floor=1),
Room(name="Small", floor=1),
Room(name="High", floor=42),
],
translations={
"Hello": "👋",
"World": "🌎",
}
)

And lastly, convert to JSON:

space_string = json.dumps(my_spaceship)

Oops!

TypeError: Object of type Spaceship is not JSON serializable

This means that Python needs to be told explicitly what data should it take from the class to the JSON, should it take just the attributes? also properties? maybe it should even save the functions somehow?

So let’s tell Python to take just the attributes of the class, using the built-in function vars which takes a class and returns its dict representation which contains only its attributes.

space_string = json.dumps(my_spaceship, default=vars)

Finally, we get the desired JSON:

{
"rooms": [
{"name": "Big", "floor": 1},
{"name": "Small", "floor": 1},
{"name": "High", "floor": 42}
],
"translations": {
"Hello": "👋",
"World": "🌎",
}
}

And then, when we try to convert it back to the original object, all seems to be working just fine:

revived_ship = json.loads(space_string)

Which gives the seemingly good output:

{ 'rooms': [{'name': 'Big', 'floor': 1}, {'name': 'Small', 'floor': 1}, {'name': 'High', 'floor': 42}], 'translations': {'Hello': '👋', 'World': '🌎'}}

But after a deeper look, we can see that it’s just a bunch of dicts and lists of ints and strings. Accessing revived_ship.room will fail with:

AttributeError: 'dict' object has no attribute 'rooms'

Where did our classes go? I want my class back!! 😭

Understanding the difference between C# and Python

In one word, the difference is Typing.

If you noticed, when we told C# to deserialize the JSON to an object, we had to specify the object type, Spaceship.

var revivedShip = JsonSerializer.Deserialize<Spaceship>(spaceString);

This way, while deserializing, C# knew which class instance to create. After creating the (empty) object, C# assigned the class properties with the values from the JSON, recursively for all levels.

But wait, C# knew the type of Spaceship, but what about the nested types? What about the List, Room and Dictionary? we didn’t mention them to C#, how did C# know to create the correct instances?

Well, we didn’t need to. They are already defined in the Spaceship class, so initially, C# generated a Spaceship object. Then looked at the Spaceship properties and went to assign the values from the JSON. The first Spaceship property is Rooms. Since Rooms type is defined in the class as List<Room>, C# knows to generate this type, to assign the JSON rooms values (and so on for the rest of the properties).

Python, on the other hand, is a dynamically-typed language.

This is why json.loads(space_string) doesn’t require specifying what type to load into. It also doesn’t care about the types of the object’s inner attributes.

If so, how will it load the JSON into a Spaceship object?
Well, it won’t.
There is a small set of the most basic types that define a JSON: bool, number, string, list, and dict. Python will use them as a simple representation of the Spaceship.

Now we understand the errors that we got when trying to access revived_ship.rooms:

AttributeError: 'dict' object has no attribute 'rooms'

Python loaded the JSON as a fake Spaceship, which is just a dict representation of the Spaceship, where the dict keys represent the attributes' names, and the values are basic representations of the attributes’ values.

Since our Spaceship suddenly turned into a dict, it will not contain a rooms attribute and will fail when trying to access such. Rather, we can access the rooms list using revived_ship["rooms"].

How to do it in Python?

There are a few Python libraries that aim to solve this issue. It would be remiss not to mention marshmallow, which has been working since Python 2. But I’ll present pydantic’s approach, since to my eyes, it’s more elegant.

After understanding how deserialization work in C#, pydantic’s approach is fairly simple. It relies on Python 3’s (≥3.5.9) feature of TypeHints.

With pydantic, we just need to go over our classes and specify the attributes’ types. This way Python will know how to match a JSON value to the correct object type.

In addition, we need our classes to inherit pydantic’s BaseModel.

So here’s what our new code looks like:

class Room(BaseModel):
name: str
floor: int
class Spaceship(BaseModel):
rooms: list[Room]
translations: dict[str, str]
my_spaceship = Spaceship(
rooms=[
Room(name="Big", floor=1),
Room(name="Small", floor=1),
Room(name="High", floor=42),
],
translations={
"Hello": "👋",
"World": "🌎",
}
)

We will serialize our Spaceship using pydantic’s json() function, which exists in every BaseModel:
(In fact, we can use json.dumps(...), both give the same JSON string)

space_string = my_spaceship.json()

Then deserialize into a Spaceship object using the parse_raw(...) function, which also exists in every BaseModel:

revived_ship = Spaceship.parse_raw(space_string)

And there we have our spaceship back, full of rooms, and one small dictionary.

type(revived_ship) # <class '__main__.Spaceship'>

How to Serialize and Deserialize child classes with Pydantic? Read all about it in this great article by Bat-El Ziony Sabati.

Conclusion

In statically-typed languages, deserialization of data is not an issue. Variable types are defined explicitly, which tells the serializer how to match the raw data into the class structure.

Dynamically-typed languages, on the other hand, can’t load the raw data into objects since they don’t know what type the target class should be.

As mitigation, either load it into general, match-all classes, like Python’s json.loads(...) approach, to load the data into dicts and lists.

An improved option is to hint the serializer about the data types.

😉 Now you know that even though Python is a dynamically-typed language, there are creative solutions so it’ll not be inferior to C#

Bonus, insuffcient solutions

There is more than one alternative to make deserialization work, some alternatives are easier than others. I will mention one briefly and focus on others that are allegedly easier than the solution I proposed above, but can actually cause problems.

Writing custom JSONDecoder

Use can use the json library with a custom JSONDecoder which will know how to load your types. But this will require a specific implementation that will read parts of the JSON values and generate the objects accordingly. We want to avoid specific handling, we want magic!

Other alternatives

Let’s investigate some other serialization options that supposedly work, but are not recommended.

In Python, there is the built-in library pickle, which defines itself as “implements binary protocols for serializing and de-serializing a Python object structure”. That sounds just like what we need, right?

Let’s use it:

binary_ship = pickle.dumps(my_spaceship)
revived_ship = pickle.loads(binary_ship)
type(revived_ship) # <class '__main__.Spaceship'>

We can do the same with the wonderful PyYAML library, which is a “YAML parser and emitter for Python”.

yaml_ship = yaml.dump(my_spaceship, Dumper=yaml.Dumper)
revived_ship = yaml.load(yaml_ship, Loader=yaml.Loader)
type(revived_ship) # <class '__main__.Spaceship'>

(A similar example can be used with the jsonpickle library)

Seems to be working just fine!!? Why not go with this?

How does it really work?

After being familiar with Python’s issue of deserializing objects, there is even a better question: How pickle and PyYAML can deserialize the objects? how do they have information about the target object types?

Well, when taking a closer look at the serialized object, we see something that didn’t exist before. Let’s look at the YAML serialization.

print(yaml_ship)>>>!!python/object:__main__.Spaceship
rooms:
- !!python/object:__main__.Room
floor: 1
name: Big
- !!python/object:__main__.Room
floor: 1
name: Small
- !!python/object:__main__.Room
floor: 42
name: High
translations:
Hello: "👋"
World: "🌎"

See these lines with !!? Whenever we use our own defined classes, their full name, including namespace, is saved as part of the YAML dump.

Similarly, for binary serialization, a lot of typing information is embedded into the binary dump.

(Similarly, jsonpickle library encodes typing information into the encoded JSON under the py/object key)

First, this answers our question of how can these libraries deserialize? They have all the type information as part of the serialized dump.

But what’s the problem with that? Why is embedding type information not recommended?

Why are the alternatives not recommended?

Generally speaking, when it comes to handling data, we must define some structure, such as splitting into objects, having a hierarchy, defining relations between the objects, deciding where to use lists, and so on. This is what we use classes for, to structure our data. And all this structure must be represented along with the serialized data.

Having said that, there are things that we do not want as part of the serialized data since they are not a vital part.

One such thing is the class namespace. If today our Spaceship location (which defines its namespace) is under main.py, and we move it to the garage.py, there’s no reason to also update our serialized objects, since the Spaceship remains the same.

If we use these alternatives, after moving classes location, the target types information, which resides in the serialized dumps, will no longer be valid. The library will try to look for the Spaceship class in the old location main.py since this is what’s written in the dump, but won’t find it, since it has been moved in the code:

yaml.constructor.ConstructorError: while constructing a Python object
cannot find 'Spaceship' in the module '__main__'
in "<unicode string>", line 1, column 1:
!!python/object:__main__.Spaceship
^

--

--

Mordechai Alter

A Senior Software Engineer. Loves people and gadgets, including how they work. Find me on https://bit.ly/3Hom9kk