Objective

This project involves the use of K-Means Clustering to find the best accommodation for students in Bangalore (or any other city of your choice) by classifying accommodation for incoming students on the basis of their preferences on amenities, budget and proximity to the location.

Project Context

Implementing the project will take you through the daily life of a data science engineer - from data preparation on real-life datasets, to visualising the data and running machine learning algorithms, to presenting the results.


In the fast-moving, effort-intense environment that the average person inhabits, It’s a frequent occurrence that one is too tired to fix oneself a home-cooked meal. And of course, even if one gets home-cooked meals every day, it is not unusual to want to go out for a good meal every once in a while for social/recreational purposes. Either way, it’s a commonly understood idea that regardless of where one lives, the food one eats is an important aspect of the lifestyle one leads.


Now, imagine a scenario where a person has newly moved into a new location. They already have certain preferences, certain tastes. It would save both the student and the food providers a lot of hassle if the student lived close to their preferred outlets. Convenience means better sales, and saved time for the customer.


Food delivery apps aside, managers of restaurant chains and hotels can also leverage this information. For example, if a manager of a restaurant already knows the demographic of his current customers, they’d ideally want to open at a location where this demographic is at its highest concentration, ensuring short commute times to the location and more customers served.If potential hotel locations are being evaluated, a site that caters to a wide variety of tastes would be ideal, since one would want every guest to have something to their liking.


This project is a good start for beginners and a refresher for professionals who have dabbled in python / ML before. The methodology can be applied to any location of one's choosing, so feel free to innovate!

Project Stages

The project consists of the following stages:

Project Steps

High-Level Approach

  • Fetch Datasets from the relevant locations (Data Collection)
  • Clean the Datasets to prepare them for analysis. (Data Cleaning via Pandas)
  • Visualise the data using boxplots. (Using Matplotlib /Seaborn /Pandas)
  • Fetch Geolocational Data from the Foursquare API. (REST APIs)
  • Use K-Means Clustering to cluster the locations (Using ScikitLearn)
  • Present findings on a map. (Using Folium/Seaborn)

The desired end result of this project is something like this:

Final Output

Applications

K-Means clustering is used in a variety of examples or business cases in real life, like:

  • Academic performance (grouping students by their learning rate)
  • Diagnostic systems (grouping system faults under various reasons)
  • Search engines (grouping search results)
  • Wireless sensor networks (Mapping networks)

The FourSquare API data can be used for:

  • Building a restaurant review app like Swiggy Zomato etc.
  • Supporting a ride sharing service like Uber Pool

OVERVIEW

Exploratory Analysis of Geolocational Data

View Complete Project