You will be able to automate general activities like following, likes, comments and exploring in Instagram apps using python and selenium automation.
You will be able to automate general activities like following, likes, comments and exploring in Instagram apps using python and selenium automation.
Instagram is one of the leading social media apps today. You yourself must have had some experience in using Instagram. But often you might have got tired of following, liking, commenting some person or some post every now and then. So why not automate the process using simple selenium automation techniques? Using Selenium webdriver we can interact with a webpage like a real user and perform various actions like clicking, scrolling, typing to achieve goals like following, liking and commenting (here).
Web automation today is a goto solution for testing an application, but it also has various other use cases like automating redundant processes for digital marketers, and SEO specialists. Also we can use automation to gather data for a particular business page, helping them with better user engagement by helping them figure out their audience's sentiment using NLP analysis on comments (challenge yourself by trying this out). For various computer vision models datasets are required. A good way to gather the data specific to the use case is by using automation rather than using the generic datasets on the web. This project can be a headstart for your data extraction journey. Use skills acquired in this project and build scripts for other websites as well.
Modern websites dynamically load data which makes it hard to just make curl requests to that site, rather we need to interact with the page in order to extract the data. Apart from this it is also really fun to build automation scripts for your daily web chores.
This project is a good start for beginners and a refresher for professionals who have dabbled in python scripts/selenium/web crawlers before. The experience of implementing this basic automation will be helpful in learning web crawlers and more, so feel free to innovate and explore!
This Product Architecture consists of 5 stages as follows:
The final automated process will be like this -
You will be able to automate general activities like following, likes, comments and exploring in Instagram apps using python and selenium automation.
Instagram is one of the leading social media apps today. You yourself must have had some experience in using Instagram. But often you might have got tired of following, liking, commenting some person or some post every now and then. So why not automate the process using simple selenium automation techniques? Using Selenium webdriver we can interact with a webpage like a real user and perform various actions like clicking, scrolling, typing to achieve goals like following, liking and commenting (here).
Web automation today is a goto solution for testing an application, but it also has various other use cases like automating redundant processes for digital marketers, and SEO specialists. Also we can use automation to gather data for a particular business page, helping them with better user engagement by helping them figure out their audience's sentiment using NLP analysis on comments (challenge yourself by trying this out). For various computer vision models datasets are required. A good way to gather the data specific to the use case is by using automation rather than using the generic datasets on the web. This project can be a headstart for your data extraction journey. Use skills acquired in this project and build scripts for other websites as well.
Modern websites dynamically load data which makes it hard to just make curl requests to that site, rather we need to interact with the page in order to extract the data. Apart from this it is also really fun to build automation scripts for your daily web chores.
This project is a good start for beginners and a refresher for professionals who have dabbled in python scripts/selenium/web crawlers before. The experience of implementing this basic automation will be helpful in learning web crawlers and more, so feel free to innovate and explore!
This Product Architecture consists of 5 stages as follows:
The final automated process will be like this -
First we set up the environment and install the dependencies and do a low level implementation (Proof of Concept) of the components involved in the project. For this we are going to automate the login process to get started.
You may go through this tutorial as well for dependency installation. In this example Firefox has been used as the browser but you are free to use any browser of choice; the only aspect that varies is the change in web driver packages for each browser.
There are some nice to have settings like profiles and options you can go through in this documentation
Example:
options=Options()
options.add_argument("--allow-notifications")
options.add_argument("--allow-geolocation")
options.add_argument("--start-maximized")
Install Geckodriver and all other necessary packages
Explore how Selenium webdriver and geckodriver works.
Import all necessary libraries in the script.
Open Instagram login page using driver. Use driver.get(url) to open instagram login page.
Checkout unique identifiers for the input fields login and passwords.
Explore how xPaths help locating various elements in a web page.
Examples:
//input[@name='username']
//span[@class="wmtNn"]/div/div/button
Discover login, password and submit elements and interact with them to achieve sign in. Use driver.find_element_by_xpath(xpath) to find the elements and use input_element.send_keys(“text”) to send text in the input boxes. Use submit_element.click() to achieve click action.
The idea of this milestone is that you should explore how a web page is structured and how to inspect various components of a webpage like input fields, text, buttons etc. before interacting with them.
Then to achieve automated login activity with just a few lines of code.
When we open the explore page we need to click on the first post to get started with the crawling. After that we need to click on the like button to like the post and click on the next icon to advance to further posts. At this stage we need to store the URL of the images/videos in a metadata csv along with other attributes like profile name, number of likes, and comments etc, to be able to process it later. Further if we need to comment as well, we can use send_keys function in selenium to simulate typing by a user. Here we can store some hard coded messages in an array in a json file to be loaded at time of running and randomly publishing comments from this array.
Alternatively, we can also use ML to detect sentiment of the post using the post description text/other comments to further improve it. But for basic proof of concept we will stick to either hard coded responses or duplicate other comments. Further we can follow a profile using the following button. After following a page we can also redirect to that page and like other sets of posts on that page as well.
Now, to prevent thrashing and throttling at instagrams servers, we need to limit our requests for that we need to put sleep of random time at various stages of our script. Also, instead of crawling just the explore page generically, we can also crawl particular tags as well.
Open various starter links for explore page/ tags page/ profile page using driver.get, note here while using driver.get function our page might reload and we can lose reference to previously stored elements in variables.
Explore element scroll and click functionality to interact with buttons. This might be needed for two reasons, one selenium can't interact with an element if it's not in the view, and secondly, in modern web pages content loads dynamically, so scrolling down may trigger loading of additional content. We can also achieve scrolling by running javascript functions using python as shown below.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
follow_button = lambda: driver.find_element_by_xpath(xpath).click()
follow_button()
The idea of this milestone is that you should explore how to interact with the webpage and automate various interactions like clicking and typing which can be done using selenium. Also, it will also help to grasp the concept of rate limiting and throttling and how to prevent it.
The automation process should be like this -
In this task we will use urlretrieve
functionality from urllib.request
library to fetch posts from Instagram's server. While the automation is running it can encounter various exceptions for which we need to handle them using pythons; and this is called exception handling. We also need to log these errors to further develop the script.
The metadata (consisting of pictures, videos, comments mainly) are collected so that they can be subjected to some ML /DL models from which you can further advance this project by gaining analysis reports, like sentiment/emotion analysis, auto-reply and much more.
NOTE: This is a completely optional task and as it is just an idea for you to work on. If you are in for a challenge then surely you should try this out. Else you can simply skip this milestone.
urllib.request.urlretrieve(source_url, target_name)
urlretrieve
may not fetch any result, which can happen due to several reasons like its expiration, or throttling, or due to unavailability etc. For these cases we may need to structure our code to handle it. In case of expiration/unavailability on the severs, we can store/mark those URLs with a flag. If the count of URLs with this flag as true continuously exceeds, we may need to trigger to stop the script because this is the case or throttling. To prevent this from happening, we need to add sleep statements between each request.try:
//code which might throw an exception.
except:
//what to do if an exception occurs. Hint - log it.
finally:
//this code runs always.
The idea of this Milestone is that you should explore how to make requests to server to fetch the data and how to handle and log exceptions in python
Publish your project by making a new GitHub repository and have some green goodness!
You should be able to deploy the application on a cloud platform.
Now that your application is complete, it’s ready to be deployed! Go on and deploy your application on the Google Cloud Platform in a Docker Container.
If you are new to cloud services, you can go through the QPrep - System Design micro-experience available on the platform before proceeding. Also, if you’re new to Docker, kindly go through the Docker Introduction and Docker Advanced Bytes.
git clone
.options.headless=True
You should be able to deploy the application on a cloud platform.