You will be creating an application which will perform web scraping of hot posts from a subreddit and automatically publish them in a FB group/page periodically.
You will be creating an application which will perform web scraping of hot posts from a subreddit and automatically publish them in a FB group/page periodically.
Web scraping, also termed as web data extraction, is the process of collecting structured data in an automated way. Generally web scraping is used by businesses for making use of the vast amount of publicly available information, so that they are able to make smarter decisions. In our project though, we are going to have some fun with it by web scraping popular posts from a subreddit. If you don’t know what a subreddit is - subreddits are like groups on Reddit, the internet’s most popular website!
Facebook is an all time favourite social media platform and most of us are a part of it. In this project, we will be automating the process of sharing a popular post from a subreddit in a dedicated FB group or page.
Disclaimer: Web Scraping has to be used only for learning purposes. Any other attempts to use the data scraped might result in legal action or Your IP might get blocked.
The project consists of the following stages:
Web scrape content which you want to post on FB, for example memes, from a subreddit.
Perform Selenium Web Automation for automatically sharing hot posts from a subreddit in a dedicated Facebook group or page.
Come up with a script that performs the aforementioned tasks periodically.
Deploy your application on a cloud platform.
You will be creating an application which will perform web scraping of hot posts from a subreddit and automatically publish them in a FB group/page periodically.
Web scraping, also termed as web data extraction, is the process of collecting structured data in an automated way. Generally web scraping is used by businesses for making use of the vast amount of publicly available information, so that they are able to make smarter decisions. In our project though, we are going to have some fun with it by web scraping popular posts from a subreddit. If you don’t know what a subreddit is - subreddits are like groups on Reddit, the internet’s most popular website!
Facebook is an all time favourite social media platform and most of us are a part of it. In this project, we will be automating the process of sharing a popular post from a subreddit in a dedicated FB group or page.
Disclaimer: Web Scraping has to be used only for learning purposes. Any other attempts to use the data scraped might result in legal action or Your IP might get blocked.
The project consists of the following stages:
Web scrape content which you want to post on FB, for example memes, from a subreddit.
Perform Selenium Web Automation for automatically sharing hot posts from a subreddit in a dedicated Facebook group or page.
Come up with a script that performs the aforementioned tasks periodically.
Deploy your application on a cloud platform.
First, we need to fetch hot posts at any given time from a subreddit. But from which subreddit? Well, r/ProgrammerHumor
is a valid candidate for coders, because they are about memes and jokes related to coding and software development in general.
You will need to have a Reddit account. So create one if you don’t have it already.
Create an app by going to the App Preferences.
Import the praw
module in your Python program.
Setup the Reddit API by adding the information of client_id
, client_secret
and user_agent
.
Download the necessary information of the subreddit you desire(ProgrammerHumour in our case).
The information would yield CSV files. The CSV files for memes will contain the URLs of the images. Download the image files using the same.
Can you retrieve the links of posts from the subreddit which are in the form of text? We can use these links for updates in a Facebook group/page.
Can you perform web scraping for fetching information from other subreddits. For example, the subreddit r/Coronavirus
provides updates related to COVID-19. You can use this information to provide regular updates of the Novel Coronavirus.
You should be able to fetch image files of memes from the subreddit r/ProgrammerHumor
, which we will later share in FB groups/pages.
A sample meme:
Every time when we see a nice post on Reddit, we want to share it with the world. We generally download the file or take a screenshot of the post and share the image. In the earlier milestone, we fetched popular posts from a subreddit which we want to post on FB. In this milestone, we’ll be able to share the popular posts, from a subreddit, on FB, by running a script.
We’ll be making use of Selenium Web Driver, which works on the browser directly and uses the browser’s in-built features to trigger the automation test written by the tester. You’ll be writing a script that fetches and interacts with the web elements.
For example, suppose you need to log in to your FB account. For doing so, you need to fill in your username and password in the browser and press the login button or press the enter key on your keyboard. For achieving the same using Selenium Web Driver, you’ll have to select the text box elements of username and password, send the respective keys, which are basically your username and password and then send the enter key command.
All of the necessary commands need to be written using a Selenium supported programming language.
[Note: The preferred way to create applications for Facebook is to create an app using their developer portal. Although, once all your configurations are in place, you need to submit your app for a review on the Facebook platform, which might take several days. Only after successful review of your app, you’ll be able to use it for real. So as a work around, we can use web automation to do the job, since we need to automate a simple feature for learning purposes]
Your Selenium based web automation script should be able to do the following:
Open facebook.com and log in to your account.
Open the url for the facebook group or page you’re interested in.
Upload a meme image file (which you obtained in the first milestone) to the group or page and post the same.
You may face some challenges when trying to fetch certain web elements. Kindly keep the following in mind:
Always first try to find an element using its xpath
.
If you face issues with xpath
, then try to find an element using its id
.
Suppose a web element is devoid of an id
, it may have a class
. But the catch with classes is that multiple web elements can have the same class
, whereas it is conventional to have elements with unique id
s. So, in such a case, you can try to fetch all the elements of a respective class and run a brute force test to find the right web element.
Another way to find a web element is to search for text, if the web element has some.
In case of searching using classes as parameters, you can do a more verbose search for your web element by combining the class
parameter with some text, that is, if the web element contains some text.
By the end of this milestone, you should be able to publish an image post in a Facebook group or page by just running a script.
A sample post by a script in action to give you another dose of joy:
We need to publish in our FB group/page periodically to keep it lively.
You need to come up with a script which performs the following actions periodically:
Web scraping the required data from Reddit.
Downloading the meme image files using the same data.
Publish the obtained images in a group/page.
Can you come up with a script which fetches the link of written posts from a subreddit and shares the same in the FB group/page as regular updates?
Can you come up with a script which takes a subreddit name and a Facebook group/page as inputs and provides regular updates of popular posts from the subreddit on the respective group/page in the form of text? With a few tweaks, you’ll be able to have an application in place which is easily configurable for any community requesting updates of posts from a subreddit in their FB group/page.
By the end of this milestone, you’ll have an application which will be able to provide periodic updates to FB groups/pages by fetching necessary information from a subreddit.
Publish your project in a GitHub repository and have some green goodness!
[Note: Kindly go through this Byte if you’re unfamiliar with Git.]
Now that your application is complete, it’s ready to be deployed! Go on and deploy your application on the Google Cloud Platform in a Docker Container.
[Note: You are free to use any Selenium supported cloud provider.]
[PS: If you are new to cloud services, you can go through the QPrep - System Design micro-experience available on the platform before proceeding. Also, if you’re new to Docker, kindly go through the Docker Introduction and Docker Advanced Bytes.]
Create a Docker container for your application. It will make the deployment easier.
Setup a cloud instance on GCP and activate it.
Upload your files to the platform. You should simply use your GitHub repository here, since it’ll do the job by a simple git clone
.
Run your application on the platform.
You should be able to deploy the application on a cloud platform.