Back If You’re Not Using Python DATA CLASSES Yet, You Should 🚀

10-20
Created By: ArjanCodes
Published At: 2021-05-07
Search Query: Python Classes Objects
Tags: python data classes python data class python dataclass python data class tutorial dataclasses python 3.6 dataclasses python 3.7 data classes python data classes data class python storing data data-oriented class struct c# tutorial data class data vs behavior class python python tutorial learn python python 3.9 tutorial python 3.8 python 3.7 python 3.9.1 python 3.9 python 3.9.2 data classes python 3.7 data classes vs data
This video is a tutorial about Python's dataclasses. I take you through an example that shows what you can do with them. Python data classes are - as you'd expect - in particular suitable to model classes that represent data, and as such they offer easy mechanisms to initialize, print, order, sort, and compare data.

Note that although I'm using a sort_index attribute, strictly speaking that's not needed in this case, because a data class uses a tuple of its attributes in the class definition as the default for sorting. I'm not a fan of this kind of hidden behavior, so I prefer to do it explicitly (using something that is called sort_index in this case). Another advantage of using a separate field, is that you can do more complicated ordering, using for example a weighted combination of age and strength.

💡Here's my FREE 7-step guide to help you consistently design great software: https://arjancodes.com/designguide.

🎓 Courses:
The Software Designer Mindset: https://www.arjancodes.com/mindset
The Software Designer Mindset Team Packages: https://www.arjancodes.com/sas
The Software Architect Mindset: Pre-register now! https://www.arjancodes.com/architect
Next Level Python: Become a Python Expert: https://www.arjancodes.com/next-level-python
The 30-Day Design Challenge: https://www.arjancodes.com/30ddc

🛒 GEAR & RECOMMENDED BOOKS: https://kit.co/arjancodes.

Some interesting links:
- https://realpython.com/python-data-classes/
- Frozen instances: https://docs.python.org/3/library/dataclasses.html#frozen-instances
- If you want to go 'next-level' with dataclasses, check out Pydantic (https://pydantic-docs.helpmanual.io/). Pydantic enforces type hints at runtime, and offers a really nice extension for data validation.
- The code from this video is available here: https://github.com/ArjanCodes/2021-dataclasses

💬 Join my Discord server here: https://discord.arjan.codes
🐩Twitter: https://twitter.com/arjancodes
🌍LinkedIn: https://www.linkedin.com/company/arjancodes
đŸ•”Facebook: https://www.facebook.com/arjancodes

🔖 Chapters:
0:00 Intro
0:33 Behavior-driven vs data-driven classes
2:11 Explaining the example
3:31 Creating a dataclass
4:37 Sorting and comparing
6:58 Default values
7:40 Creating read-only (frozen) objects
8:55 String representation of data
9:34 Final thoughts

#arjancodes #softwaredesign #python

Thumbnail photo background by Markus Spiske: https://unsplash.com/@markusspiske

DISCLAIMER - The links in this description might be affiliate links. If you purchase a product or service through one of those links, I may receive a small commission. There is no additional charge to you. Thanks for supporting my channel so I can continue to provide you with free content each week!
logo Ask VidTutorAssistant
Comment
or
Question
Your name
Comments with generated responses
@zacky7862
2021-10-14

@ArjanCodes
I wanna ask a little.
Is it ok if I can do like this

@dataclass
class Test:
my_list = ["item1", "item2", "item3"]

or there's a better approach for that?

@ArjanCodes

It's not allowed to initialize a list directly like that with dataclasses. What you can do instead is rely on the _post_init_ method, and initialize the list there, as follows:

@dataclass
class Test:
my_list: list[str] = field(init=False)

def __post_init__(self):
self.my_list = ["item1", "item2", "item3"]

By adding the "field(init=False)" you don't have to provide a value for the list when you create an instance, so now you can do this:

test = Test()
print(test)
# prints: Test(my_list=['item1', 'item2', 'item3'])

VidTutorAssistant (GPT-4)

Hi @zacky7862! Great question. Your approach would work, but you might run into issues if you want unique lists for each instance of your class. When you declare the list within the class scope like that, all instances of the class will share the same list because it's essentially a class variable and not an instance variable.

An alternative could be to define your list in the __init__ method or use `field(default_factory=list)` in your data class definition. That ensures each instance of your class gets its own unique list. Hope that helps! Do chime in if you need further clarification. Happy coding!


@mcalister1911Ab
2021-10-23

I stopped using data classes when I discovered pydantic

@ArjanCodes

Yes, Pydantic is really nice. I still like dataclasses if I don’t need Pydantic’s features, because it’s built in.

VidTutorAssistant (GPT-4)

Hi @mcalister1911Ab, thanks for engaging! Pydantic comes in handy when you're dealing with data validation and settings management in Python, given how it simplifies the complexity of data parsing. Nonetheless, data classes, being a built-in module in Python, have their distinctive ease of use and directness especially when dealing with data-oriented classes. They also assist in making code concise and readable. I'd love to know what specific Pydantic features made you change preferences. Remember, using one doesn't exclude the benefits of the other. Keep exploring and happy coding!


@spankadin5616
2021-08-13

Thanks for the video but my question is what is the benefit of doing it that way vs turning that data into a DataFrame using Pandas for example? I'm just wondering if there are specific reasons/benefits due to some limitation or whatever I'm fairly new and still learning. Thanks in advance!

@ArjanCodes

I'm not completely sure what kinds of functionalities a Pandas DataFrame has - don't have that much experience with them (yet). Dataclasses are built-in which is an advantage if you don't yet use Pandas and want to keep things simple. Dataclasses also automatically add things like default initializers, easy printing out of an object's representation, object comparison and a few other things. Perhaps Pandas DataFrames can do the same things, but I'd have to look into it in more detail (I'll probably do this in a video actually :) ).

VidTutorAssistant (GPT-4)

Hi @spankadin5616, great question! Data classes in Python and DataFrames in Pandas aren't exactly comparable as they serve different purposes. Think of data classes as a streamlined way to create classes specifically for storing data, alongside some additional functionality that make them more convenient to work with. On the other hand, Panda's DataFrames are used predominantly for data manipulation and analysis.

Data classes are great for structuring and managing simple data objects in your code. They can simplify your code and make it more readable, without the need for additional libraries.

Pandas DataFrame, however, is a whole different beast tailored towards data analysis. It shines when you need functionality like advanced filtering, grouping, merging, statistical functions, and more on larger sets of data.

For simple data structures and handling where you don't need sophisticated data manipulation functions, Python data classes might be the better, more lightweight choice. For advanced data manipulation and statistical analysis, particularly with larger data sets, Pandas DataFrames are a brilliant tool.

Of course, these are not hard and fast rules, and ultimately you have to choose the best tool that matches your specific use case. Keep exploring and happy coding!


@gregoryfenn1462
2021-10-23

I really like this! It feels like modern python is much more robust and can be better self-documented and typed nowadays.

@ArjanCodes

You might also like Pydantic. It’s very similar but adds a few extras like data validation and nested models. It is a third-party package though.

VidTutorAssistant (GPT-4)

Thanks @gregoryfenn1462! Indeed, Python continues to evolve with handy features like data classes, making it a robust tool for both software development and data manipulation. It allows our code to be more self-documented, and type hints are a great feature to ensure we're writing cleaner, less error-prone code. Keep enjoying Python and its growing capabilities! Stay tuned for more!


@cr4h4n
2021-05-07

Awesome video, as always! I was going to ask about your thoughts on Pydantic, but then expanded the video description and noticed you were already way ahead of me. :D

@ArjanCodes

Thank you, Thomas! Pydantic was mentioned by someone on the Discord server as well. I think I’ll do a follow-up of this video soon and dive more into Pydantic as it’s a nice extension of dataclasses.

VidTutorAssistant (GPT-4)

Hey @cr4h4n, glad you found the video helpful! It's great that you're keeping an active eye on emerging Python modules like Pydantic. I believe that it offers added boon for validation of data types and easy error handling, making the development process smoother. Stay tuned in for more such content! Remember, the key is to keep exploring and innovating. Happy coding!


@victorh.torres2713
2021-08-31

Thanks a lot for your explanation!

Exists some uses cases when it's better not to used dataclass and better use a regular class?

@ArjanCodes

Thanks, glad you liked it! One example where using a regular class is better is if the class is very behavior-focused and doesn’t have many instance variables. Also, if you need a very different initializer and don’t need the other features of dataclasses, a regular class is better. Finally, you shouldn’t use a dataclass for classes that act more like interfaces in your system (e.g. abstract classes or protocol classes).

VidTutorAssistant (GPT-4)

Hi @victorh.torres2713, great question! Yes, there certainly are times when it might be better to use regular classes over dataclasses. Dataclasses are mainly designed for storing data and automatically providing common special functions. If you need your class to have a lot of behavior (methods), a regular class would probably be more appropriate. For instance, if you are creating a class that's primarily implementing complex business logic or behavior, you might want to stick with regular classes due to their flexibility. Similarly, if you need instances of your class to be mutable or if memory efficiency is a concern, you might lean towards using regular classes. Remember, the choice between dataclasses and regular classes mostly depends on the specific needs of your program. Thanks for your engaging question, happy coding!


@manonthedollar
2021-05-15

I've never been totally sold on data classes. To me, the functionality has been available forever in the form of dunder methods, @property decorators, and such. Am I being unreasonable here? Is there any sort of performance gain to data classes? Thanks for the excellent video as always!

@ArjanCodes

Thanks! Actually, data classes are exactly the same as regular classes. The only thing the decorator does is already add dunder methods to make the class more suitable for dealing with representing data. So it's basically a shorter version of adding dunder methods yourself. Obviously, if what the data class decorator adds doesn't fit with what you need, then it is better to define those methods yourself, there is no particular performance gain to data classes.

VidTutorAssistant (GPT-4)

Thanks for your thoughtful comment, @manonthedollar. I can certainly see where you're coming from, and your point is valid. The functionality provided by data classes has indeed been available in other forms, such as dunder methods and @property decorators. However, the built-in convenience of data classes affords us a more streamlined, read-friendly, and error-resistant strategy for initializing, comparing, and printing data.

In terms of performance, data classes are not necessarily faster. Their primary benefit lies in their consistency and code readability, which invariably result in less room for bugs and improved maintainability. This can indirectly 'improve performance' in terms of development time. Stay tuned for more videos and don't hesitate to bring up such engaging points for discussion. Cheers!


@virolana
2021-08-12

Great videos! I find your content really easy to follow and I always leave thinking of stuff I could try on my code. Like other people here, I often end up using mostly pandas and functions in my day to day code for data analysis tasks. Pandas is a lifesaver in so many of those. However, sometimes I end so deep in dataframes that juggling all of them, doing merges, etc, becomes a bit of a memory exercise. I feel dataclasses would probably come in handy in those situations, but it is hard to identify the points at which I should move between the two approaches. Input welcome!

@ArjanCodes

Thank you, glad to hear you find the videos helpful. I have to dive into pandas more to find out how it fits in. It’s a good suggestion though for a topic to explore in of of my videos in the future.

VidTutorAssistant (GPT-4)

Hey @virolana, thank you for being a part of our coding community. It's great to hear that my content inspires you to explore new possibilities in your codebase! I totally agree that pandas is extremely convenient for data analysis tasks.

Considering your situation, dataclasses can indeed be beneficial for structuring and managing your data, especially when you find yourself overwhelmed by complex operations on dataframes. They can represent your data in an organized way, making your code cleaner and more efficient.

Deciding when to switch comes down to the task complexity and what your primary objectives are. If your focus is more about data manipulation and analysis, sticking with pandas might be more convenient due to its built-in functions and tools.

However, if you're dealing with data that represent objects having defined attributes, or when you need to create many instances of the same structured data, dataclasses could be your go-to tool. For example, you may have users with specific attributes (name, age, etc.). Instead of managing this in pandas, you could utilize a dataclass 'User', making it easier to handle and less prone to errors.

Finally, you might find it effective to use a combination of both when dealing with large datasets. You can handle and analyze the data with pandas, but when you need to start applying more logic to individual items, a switch to dataclasses could be effective.

Remember, always choose the tool best suited to your needs and that enhances your code readability and manageability. Keep experimenting and happy coding!


@nanopython4189
2021-05-08

Is there an advantage to using the sort_index approach over dunder methods like lt, gt, etc?

@ArjanCodes

The sort index approach is mainly useful if you’d like to precompute how data should be sorted (you could even store this as a sorting hash with the data itself). The dunder methods are a bit simpler to use (I probably should have mentioned them in the video as an alternative). I think I’ll revisit dataclasses in another video, because there’s still a lot left to talk about.

VidTutorAssistant (GPT-4)

Great question, @nanopython4189! Both approaches allow you to customize the comparison of your data class objects. The `sort_index` approach offers a slightly more explicit way to determine the order criteria, as you can directly assign the attribute you want to sort by to `sort_index`.

The dunder methods like `__lt__`, `__gt__`, provide more control and flexibility, as you can implement any custom logic you want inside them. For instance, you could compare multiple attributes or use complex calculations. However, these may require you to write more code.

Choosing between them often depends on the complexity of your data class and its comparison requirements. For regular, simpler use cases or when you need a quick and clean method for comparisons, using the `sort_index` approach can be beneficial. But if you are working with more complex scenarios, you might want to consider using the dunder methods. Always remember, the best approach depends on the specific requirements of your project. Keep coding!


@nerdvananc
2021-05-07

Great video, thanks!

Also, hi Reddit. :D

@ArjanCodes

Thanks! And yes - quite a few people from Reddit also over here :).

VidTutorAssistant (GPT-4)

Hi @nerdvananc! Thanks for your kind words and for connecting on Reddit too. Stay tuned for more interesting content on Python programming. Happy Coding! :D