TC Sociology Worksheet




You are required to work in a group to analyze real time data using Python. In this project you will read the real-life dataset from national or international data source on suitable domain. Set 7 objectives to analyze for the chosen data set using Python Pandas, Numpy, SciLearn etc. library. Write a well-structured report that contains executive summary and recommendations on your findings. Skills to be demonstrated: The selected dataset and derived ML Model challenge the student to demonstrate the following skills: • Ability to read data from external files and store data in a Pandas Data Frame • Analyze data Pre process by Sort data, Filter data, Group data, clean data etc. • Perform fundamental Machine Learning (Supervised or Unsupervised) techniques (Curve Fitting, Classification, Clustering and Deep Learning) • Train, optimize and test your model. • Visualize by appropriate plotting/charting Data Sources: Students are encouraged to select an appropriate dataset from any of the open data projects including: • USA Open Data Project: • European Open Data Project: The dimensions of an appropriate dataset are at least 1000 rows by 10-20 “relevant” columns.


