Scrape GitHub Repositories

CONCEPT

This is a web scraping project to scrape the top topics from Github and then from each topic, it will collect their top repository details like username, repository name, and starts. Then, we will convert the collected data into Pandas DataFrame and finally will save it in CSV file.

About

Web scraping is the process of gathering information form websites in an automated fashion with the help of a computer program and presenting them in a meaningful way. It's a useful technique for creating datasets for research and learning. For this project, we will scrape repositories of top topics availble onGitHub. GitHub is a platform which allow us to host our code in the cloud for the purpose of collaboration and version control. Basically GitHub lets the people work together on the project, and it host both public and private repositories. To do this, we will write code in python and also will use some python libraries like requests, bs4 and pandas and then will save the generated out in csv file for each topic.

View Project

Top github Topics

Get In Touch!