Download lecture
Download lecture
This course is an introduction to data cleaning, analysis and visualization.
We will teach the basics of data analysis through concrete examples.
You will learn how to take raw data, extract meaningful information, use statistical tools, and make visualizations.
Day 0 (today): setup
Day 1: An end-to-end example getting you from a dataset found online to several plots of campaign contributions.
Day 2: Lots of visualization examples, and practice going from data to chart.
Day 3: Statistics basics, including t-tests, linear regression, and statistical significance. We'll use campaign finance and per-county health rankings.
Day 4: Text processing on a large text corpus (the Enron email dataset) using tf-idf and cosine similarity.
Day 5: Scaling up to process large datasets using Hadoop/MapReduce on a larger copy of the Enron dataset.
Day 6: You tell us! Get into groups or work on your own to analyze a dataset of your choosing, and tell us a story!
https://www.capterra.com/data-visualization-software/
http://guidetodatamining.com/
https://fineartamerica.com/art/data+visualization