On the journey of becoming a data scientist one must know how to do data Analytics, I just extensively learnt data analysis with python[Pandas, Numpy] and I decided to practice and showcase what I can do with the skills I have learnt by doing this little project.
Data source
I used Top US Songs from Kaggle to perform this analysis. It has 37,146 rows of data and 16 columns: 'Album_Type', 'Artist_ID', 'Artist_Name', 'Artist_Song_Rank', 'Track_Name', 'Is_Playable', 'Album_Name', 'Release_Date','Total_Album_Tracks', 'Is_Explicit', 'ISRC', 'Song_Duration','Track_Number', 'Popularity', 'Track_Id', 'Track_URI'
Core Objective
Note : Analysis are on the Artist and there Songs
How Many of Drake Album are in the Top 50 of Top US Songs
Total Number of Drake Songs
How many Other Artist are on the List Apart from Drake
How many Cardi b Song of 2021 are on the list
Data Preparation
No much preparation for this Data_Set was Needed, i did some basic checking and saw that data is ready for analysis.. But i did change somthing though
Columns Heading
i had to add underscore in-between the heading columns, to make it easily accessable
I used the set_axis() Method with a list of new colums heading to replace the old one, at the end of the list i added the parameter inplace = True
Data Analysis
How Many of Drake Album are in the Top 50 of Top US Songs
iloc[],loc[],len()
iloc[]
The iloc[] method also know as integer location is use to select row and column using the index method..
in this case i used the iloc method to select the row from 0 to 50, then column 0 and 2 which are (Album_Type & Artist_Name)
With the help of the iloc[] method i was abel to narrow down the data_frame set to just Album_Type and Artist_Name with the index range of 0 to 50 and saved it to the variable Top2
loc[]
The loc[] method select column based on labels and drown index
The iloc[] did it job now it time for me to do mine ๐..
with reference to Top2 narrowing down the column there is still more to do, with the loc[] i was able to choose within the column value of Album_Type for "Album" and Artist_Name for "Drake" and saved it to the variable Check
Note: The Album_Type may have other value like Single and Complication
Now Print out Check output.
len()
The len() is a python function that is used to return an integer value which indicates the number of items in an object...
insome case when the object is just big for you to count you can use the len() to check for the object length within few seconds intend of counting manually
The use the len () method to check the amount of Drake album that are in Top 50.. As you can see from the image the result Print Out 10 also use the Shape() Method to check the amount of rows
Total Number of Drake Songs
Drake Total Songs on the List of Top Us song are 100 in Total
How many Other Artist are on the List Apart from Drake
Using the loc[]
method and not equal to operator (!=)
to not select Drake
name that appear in the Artist_Name
Column.
Then the drop_duplicates()
method is to drop duplicates of artist name that may appear in the column.
My default the keep
parameter is set to "First", this keep the first value then drop all other duplicate of the same value
How many Cardi b Song of 2021 are on the list
THE END