Get to know the data
This is part 2 of a 3-part series which might help some of our students in their first steps in data visualization. Make sure to also check out the other 2 parts: get to know the user and explore visual designs.
No data visualization without data. And before visualizing your data, you should understand the shape of the data itself. So get to grips with R or python pandas…
Some things you want to find out:
- How many dimensions are there? What are the types for each dimension: categorical, numeric, geo-spatial, …?
- For each of these dimensions, or at least the most important ones: how are the data distributed?
- Are there any correlations between the dimensions?
- What does a principal component analysis, independent component analysis, or singular value decomposition reveal?
- What does a hierarchical clustering show?
- Are there any local clusters? Have a look at topological data analysis (perhaps using the R TDA module), which can reveal such local clusters in a global context.
Create loads of simple plots and really take your time for this.