Data Analysis with Python on Jupyter Notebook

Data Analysis with Python on Jupyter Notebook

ยท

3 min read

On the journey of becoming a data scientist one must know how to do data Analytics, I just extensively learnt data analysis with python[Pandas, Numpy] and I decided to practice and showcase what I can do with the skills I have learnt by doing this little project.

Data source

I used Top US Songs from Kaggle to perform this analysis. It has 37,146 rows of data and 16 columns: 'Album_Type', 'Artist_ID', 'Artist_Name', 'Artist_Song_Rank', 'Track_Name', 'Is_Playable', 'Album_Name', 'Release_Date','Total_Album_Tracks', 'Is_Explicit', 'ISRC', 'Song_Duration','Track_Number', 'Popularity', 'Track_Id', 'Track_URI'

Core Objective

Note : Analysis are on the Artist and there Songs

  • How Many of Drake Album are in the Top 50 of Top US Songs

  • Total Number of Drake Songs

  • How many Other Artist are on the List Apart from Drake

  • How many Cardi b Song of 2021 are on the list

Data Preparation

No much preparation for this Data_Set was Needed, i did some basic checking and saw that data is ready for analysis.. But i did change somthing though

Columns Heading

i had to add underscore in-between the heading columns, to make it easily accessable

I used the set_axis() Method with a list of new colums heading to replace the old one, at the end of the list i added the parameter inplace = True

Data Analysis

Jupyter Notebook DATA_Frame

iloc[],loc[],len()

iloc[]

The iloc[] method also know as integer location is use to select row and column using the index method..

in this case i used the iloc method to select the row from 0 to 50, then column 0 and 2 which are (Album_Type & Artist_Name)

With the help of the iloc[] method i was abel to narrow down the data_frame set to just Album_Type and Artist_Name with the index range of 0 to 50 and saved it to the variable Top2

loc[]

The loc[] method select column based on labels and drown index

The iloc[] did it job now it time for me to do mine ๐Ÿ˜Š..

with reference to Top2 narrowing down the column there is still more to do, with the loc[] i was able to choose within the column value of Album_Type for "Album" and Artist_Name for "Drake" and saved it to the variable Check

Note: The Album_Type may have other value like Single and Complication

Now Print out Check output.

len()

The len() is a python function that is used to return an integer value which indicates the number of items in an object...

insome case when the object is just big for you to count you can use the len() to check for the object length within few seconds intend of counting manually

The use the len () method to check the amount of Drake album that are in Top 50.. As you can see from the image the result Print Out 10 also use the Shape() Method to check the amount of rows

Total Number of Drake Songs

Drake Total Songs on the List of Top Us song are 100 in Total

How many Other Artist are on the List Apart from Drake

Using the loc[] method and not equal to operator (!=) to not select Drake name that appear in the Artist_Name Column.

Then the drop_duplicates() method is to drop duplicates of artist name that may appear in the column.

My default the keep parameter is set to "First", this keep the first value then drop all other duplicate of the same value

How many Cardi b Song of 2021 are on the list

THE END

Please follow me for more article as i adventure into this journey of becoming a Data Scientist ๐Ÿ˜

ย