Farzeen Arshad Ghuman | Freelancer Bird Call Recognition

Bird Call Recognition

🦉Cornell BirdSong Recognition🦉 1. Introduction Finally, some cutie cute competition involving animals, sounds, nature, earth and all that goodness 🌍💚. This is a very new different challenge for me, and not just because it's mainly based on audio files, but because the rules are a bit different than what I'm used to. When I joined, I had (still have) some very big issues in understanding not only the rules of submission, but also the data and ... what all means? So, as I go along I will try to bring some clear understanding and also point to some fruitful discussions. Ok, here we go! Libraries 📚⬇ 🦅 In [1]: import os import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import matplotlib.image as mpimg from matplotlib.offsetbox import AnnotationBbox, OffsetImage # Map 1 library import plotly.express as px # Map 2 libraries import descartes import geopandas as gpd from shapely.geometry import Point, Polygon # Librosa Libraries import librosa import librosa.display import IPython.display as ipd import sklearn import warnings warnings.filterwarnings('ignore') 2. The .csv files 📌Note: 📁 train.csv contains information about the audio files available in train_audio . It contains 21,375 datapoints in 35 unique columns. test.csv contains only 3 observations (the rest are available in the hidden test set). Note: The TRAIN data has 1 labeled bird species per recording. However, in nature usually you can hear tens (even hundreds) of birds in one go, so in TEST set we need to predict 0, 1 or multiple species for one recording. Because of this, in TRAIN we have species column - or primary_label (main bird), secondary label (other birds heard) and background (background noises, other birds etc.) 💬 Discussions Few questions about test data Confusion about test set In [2]: # Import data train_csv = pd.read_csv("../input/birdsong-recognition/train.csv") test_csv = pd.read_csv("../input/birdsong-recognition/test.csv") # Create some time features train_csv['year'] = train_csv['date'].apply(lambda x: x.split('-')[0]) train_csv['month'] = train_csv['date'].apply(lambda x: x.split('-')[1]) train_csv['day_of_month'] = train_csv['date'].apply(lambda x: x.split('-')[2 print("There are {:,} unique bird species in the dataset.".format(len(train_ There are 264 unique bird species in the dataset. TEST.csv - let's take a look here as well before going further 📌Note: only 3 rows available (rest are in the hidden set) site : there are 3 sites in total, with first 2 having labeles every 5 seconds, while site_3 has labels at file level. row_id : this is the unique ID that will be used for the submission seconds : how long the clip is audio_id : row_id without site PS: "nocall" can be also one of the labels (hearing no bird). In [3]: # Inspect text_csv before checking train data test_csv site row_id seconds 0 site_1 site_1_0a997dff022e3ad9744d4e7bbf923288_5 5 0a997dff022e3ad 1 site_1 site_1_0a997dff022e3ad9744d4e7bbf923288_10 10 0a997dff022e3ad 2 site_1 site_1_0a997dff022e3ad9744d4e7bbf923288_15 15 0a997dff022e3ad Out[3]: 2.1 Time of the Recording ⏰ 📌Note: Majority of the data was registered between 2013 and 2019, during Spring and Summer months ( 00 is for the dates-, which are unknown). In [4]: bird = mpimg.imread('../input/birdcall-recognition-data/pink bird.jpg') imagebox = OffsetImage(bird, zoom=0.5) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(6.5, 2000)) plt.figure(figsize=(16, 6)) ax = sns.countplot(train_csv['year'], palette="hls") ax.add_artist(ab) plt.title("Audio Files Registration per Year Made", fontsize=16) plt.xticks(rotation=90, fontsize=13) plt.yticks(fontsize=13) plt.ylabel("Frequency", fontsize=14) plt.xlabel(""); In [5]: bird = mpimg.imread('../input/birdcall-recognition-data/green bird.jpg') imagebox = OffsetImage(bird, zoom=0.3) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(11, 3000)) plt.figure(figsize=(16, 6)) ax = sns.countplot(train_csv['month'], palette="hls") ax.add_artist(ab) plt.title("Audio Files Registration per Month Made", fontsize=16) plt.xticks(fontsize=13) plt.yticks(fontsize=13) plt.ylabel("Frequency", fontsize=14) plt.xlabel(""); 2.2 The Songs 🎼 📌Note: Pitch is usually unspecified. This is one of the more miscellaneous columns, that we need to be careful how we interpret. Most Song Types are call, song or flight. In [6]: bird = mpimg.imread('../input/birdcall-recognition-data/orangebird.jpeg') imagebox = OffsetImage(bird, zoom=0.12) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(3.9, 8600)) plt.figure(figsize=(16, 6)) ax = sns.countplot(train_csv['pitch'], palette="hls", order = train_csv['pit ax.add_artist(ab) plt.title("Pitch (quality of sound - how high/low the tone is)", fontsize=16 plt.xticks(fontsize=13) plt.yticks(fontsize=13) plt.ylabel("Frequency", fontsize=14) plt.xlabel(""); Type Column: 📌Note: This column is a bit messy, as the same description can be found under multiple names. Also, there can be multiple descriptions for multiple sounds (one bird song can mean a different thing from another one in the same recording). Some examples are: alarm call is: alarm call | alarm call, call flight call is: flight call | call, flight call etc. In [7]: # Create a new variable type by exploding all the values adjusted_type = train_csv['type'].apply(lambda x: x.split(',')).reset_index( # Strip of white spaces and convert to lower chars adjusted_type = adjusted_type['type'].apply(lambda x: x.strip().lower()).res adjusted_type['type'] = adjusted_type['type'].replace('calls', 'call') # Create Top 15 list with song types top_15 = list(adjusted_type['type'].value_counts().head(15).reset_index()['i data = adjusted_type[adjusted_type['type'].isin(top_15)] # === PLOT === bird = mpimg.imread('../input/birdcall-recognition-data/Eastern Meadowlark.j imagebox = OffsetImage(bird, zoom=0.43) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(12.4, 5700)) plt.figure(figsize=(16, 6)) ax = sns.countplot(data['type'], palette="hls", order = data['type'].value_c ax.add_artist(ab) plt.title("Top 15 Song Types", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); 2.3 Where is the bird? 📸🔭 📌Note: In most recordings the birds were seen, usually at an altitude between 0m and 10m. In [8]: # Top 15 most common elevations top_15 = list(train_csv['elevation'].value_counts().head(15).reset_index()[' data = train_csv[train_csv['elevation'].isin(top_15)] # === PLOT === bird = mpimg.imread('../input/birdcall-recognition-data/blue bird.jpg') imagebox = OffsetImage(bird, zoom=0.43) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(12.4, 1450)) plt.figure(figsize=(16, 6)) ax = sns.countplot(data['elevation'], palette="hls", order = data['elevation ax.add_artist(ab) plt.title("Top 15 Elevation Types", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); In [9]: # Create data data = train_csv['bird_seen'].value_counts().reset_index() # === PLOT === bird = mpimg.imread('../input/birdcall-recognition-data/black bird.jpg') imagebox = OffsetImage(bird, zoom=0.22) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(15300, 0.95)) plt.figure(figsize=(16, 6)) ax = sns.barplot(x = 'bird_seen', y = 'index', data = data, palette="hls") ax.add_artist(ab) plt.title("Song was Heard, but was Bird Seen?", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); 2.4 World View of the Species 🧭🌏 #1. Countries 🚩 📌Note: Let's look at top 15 countries with most recordings. The majority of recordings are located in the US, followed by Canada and Mexico. In [10]: # Top 15 most common elevations top_15 = list(train_csv['country'].value_counts().head(15).reset_index()['in data = train_csv[train_csv['country'].isin(top_15)] # === PLOT === bird = mpimg.imread('../input/birdcall-recognition-data/fluff ball.jpg') imagebox = OffsetImage(bird, zoom=0.6) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(12.2, 7000)) plt.figure(figsize=(16, 6)) ax = sns.countplot(data['country'], palette='hls', order = data['country'].v ax.add_artist(ab) plt.title("Top 15 Countries with most Recordings", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); #2. Map View 🧭 In [11]: # Import gapminder data, where we have country and iso ALPHA codes df = px.data.gapminder().query("year==2007")[["country", "iso_alpha"]] # Merge the tables together (we lose a fiew rows, but not many) data = pd.merge(left=train_csv, right=df, how="inner", on="country") # Group by country and count how many species can be found in each data = data.groupby(by=["country", "iso_alpha"]).count()["species"].reset_in fig = px.choropleth(data, locations="iso_alpha", color="species", hover_name color_continuous_scale=px.colors.sequential.Teal, title = "World Map: Recordings per Country") fig.show() World Map: Recordings per Country #3. Another Map! 😄 Where are our birds? 🦜 In [12]: # SHP file world_map = gpd.read_file("../input/world-shapefile/world_shapefile.shp") # Coordinate reference system crs = {"init" : "epsg:4326"} # Lat and Long need to be of type float, not object data = train_csv[train_csv["latitude"] != "Not specified"] data["latitude"] = data["latitude"].astype(float) data["longitude"] = data["longitude"].astype(float) # Create geometry geometry = [Point(xy) for xy in zip(data["longitude"], data["latitude"])] # Geo Dataframe geo_df = gpd.GeoDataFrame(data, crs=crs, geometry=geometry) # Create ID for species species_id = geo_df["species"].value_counts().reset_index() species_id.insert(0, 'ID', range(0, 0 + len(species_id))) species_id.columns = ["ID", "species", "count"] # Add ID to geo_df geo_df = pd.merge(geo_df, species_id, how="left", on="species") # === PLOT === fig, ax = plt.subplots(figsize = (16, 10)) world_map.plot(ax=ax, alpha=0.4, color="grey") palette = iter(sns.hls_palette(len(species_id))) for i in range(264): geo_df[geo_df["ID"] == i].plot(ax=ax, markersize=20, color=next(palette) 🔈🔉🔊 3. The Audio Files 3.1 Description train_audio: short recording (majority in mp3 format) of INDIVIDUAL birds. test_audio: recordings took in 3 locations: Site 1 and Site 2: recordings 10 mins long (mp3) that have labeled a bird every 5 seconds. This is meant to mimic the real life scenario, when you would usually have more than 1 bird (or no bird) singing. Site 3: recordings labeled at file level (because it is especially hard to have coders trained to label these kind of files) 📌Note: 3.2 Duration and File Types 📁 In [13]: # Creating Interval for *duration* variable train_csv['duration_interval'] = ">500" train_csv.loc[train_csv['duration'] <= 100, train_csv.loc[(train_csv['duration'] > 100) train_csv.loc[(train_csv['duration'] > 200) train_csv.loc[(train_csv['duration'] > 300) train_csv.loc[(train_csv['duration'] > 400) 'duration_interval'] = "<=100" & (train_csv['duration'] <= 200) & (train_csv['duration'] <= 300) & (train_csv['duration'] <= 400) & (train_csv['duration'] <= 500) bird = mpimg.imread('../input/birdcall-recognition-data/multicolor bird.jpg' imagebox = OffsetImage(bird, zoom=0.4) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(4.4, 12000)) plt.figure(figsize=(16, 6)) ax = sns.countplot(train_csv['duration_interval'], palette="hls") ax.add_artist(ab) plt.title("Distribution of Recordings Duration", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); In [14]: def show_values_on_bars(axs, h_v="v", space=0.4): def _show_on_single_plot(ax): if h_v == "v": for p in ax.patches: _x = p.get_x() + p.get_width() / 2 _y = p.get_y() + p.get_height() value = int(p.get_height()) ax.text(_x, _y, value, ha="center") elif h_v == "h": for p in ax.patches: _x = p.get_x() + p.get_width() + float(space) _y = p.get_y() + p.get_height() value = int(p.get_width()) ax.text(_x, _y, value, ha="left") if isinstance(axs, np.ndarray): for idx, ax in np.ndenumerate(axs): _show_on_single_plot(ax) else: _show_on_single_plot(axs) In [15]: bird = mpimg.imread('../input/birdcall-recognition-data/yellow birds.jpg') imagebox = OffsetImage(bird, zoom=0.6) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(2.7, 12000)) plt.figure(figsize=(16, 6)) ax = sns.countplot(train_csv['file_type'], palette = "hls", order = train_cs ax.add_artist(ab) show_values_on_bars(ax, "v", 0) plt.title("Recording File Types", fontsize=16) plt.ylabel("Frequency", fontsize=14) plt.yticks(fontsize=13) plt.xticks(rotation=45, fontsize=13) plt.xlabel(""); 3.3 Listening to some Recordings 📌Note: What is sound? In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In [16]: # Create Full Path so we can access data more easily base_dir = '../input/birdsong-recognition/train_audio/' train_csv['full_path'] = base_dir + train_csv['ebird_code'] + '/' + train_cs # Now let's sample a fiew audio files amered = train_csv[train_csv['ebird_code'] cangoo = train_csv[train_csv['ebird_code'] haiwoo = train_csv[train_csv['ebird_code'] pingro = train_csv[train_csv['ebird_code'] vesspa = train_csv[train_csv['ebird_code'] == == == == == "amered"].sample(1, "cangoo"].sample(1, "haiwoo"].sample(1, "pingro"].sample(1, "vesspa"].sample(1, random_sta random_sta random_sta random_sta random_sta bird_sample_list = ["amered", "cangoo", "haiwoo", "pingro", "vesspa"] 🎶 Ok, let's hear some songs! 🕊 In [17]: # Amered ipd.Audio(amered) Out[17]: 0:00 / 0:05 In [18]: # Cangoo ipd.Audio(cangoo) Out[18]: 0:00 / 0:37 In [19]: # Haiwoo ipd.Audio(haiwoo) Out[19]: 0:00 / 1:06 In [20]: # Pingro ipd.Audio(pingro) Out[20]: 0:00 / 1:37 In [21]: # Vesspa ipd.Audio(vesspa) Out[21]: 0:00 / 1:10 3.4 Extracting Features from Sounds 📌The audio data is composed by: 🔓 1. Sound: sequence of vibrations in varying pressure strengths ( y ) 2. Sample Rate: ( sr ) is the number of samples of audio carried per second, measured in Hz or kHz In [22]: # Importing 1 file y, sr = librosa.load(vesspa) print('y:', y, '\n') print('y shape:', np.shape(y), '\n') print('Sample Rate (KHz):', sr, '\n') # Verify length of the audio print('Check Len of Audio:', np.shape(y)[0]/sr) y: [- -. ] - ... - - y shape: -,) Sample Rate (KHz): 22050 Check Len of Audio:- In [23]: # Trim leading and trailing silence from an audio signal (silence before and audio_file, _ = librosa.effects.trim(y) # the result is an numpy ndarray print('Audio File:', audio_file, '\n') print('Audio File shape:', np.shape(audio_file)) Audio File: [- -. ] Audio File shape: -,) - ... - - In [24]: # Importing the 5 files y_amered, sr_amered = librosa.load(amered) audio_amered, _ = librosa.effects.trim(y_amered) y_cangoo, sr_cangoo = librosa.load(cangoo) audio_cangoo, _ = librosa.effects.trim(y_cangoo) y_haiwoo, sr_haiwoo = librosa.load(haiwoo) audio_haiwoo, _ = librosa.effects.trim(y_haiwoo) y_pingro, sr_pingro = librosa.load(pingro) audio_pingro, _ = librosa.effects.trim(y_pingro) y_vesspa, sr_vesspa = librosa.load(vesspa) audio_vesspa, _ = librosa.effects.trim(y_vesspa) 🌊 (2D Representation) #1. Sound Waves In [25]: fig, ax = plt.subplots(5, figsize = (16, 9)) fig.suptitle('Sound Waves', fontsize=16) librosa.display.waveplot(y librosa.display.waveplot(y librosa.display.waveplot(y librosa.display.waveplot(y librosa.display.waveplot(y = = = = = audio_amered, audio_cangoo, audio_haiwoo, audio_pingro, audio_vesspa, sr sr sr sr sr = = = = = for i, name in zip(range(5), bird_sample_list): ax[i].set_ylabel(name, fontsize=13) #2. Fourier Transform 🥁 sr_amered, sr_cangoo, sr_haiwoo, sr_pingro, sr_vesspa, color color color color color = = = = = "#A300F9" "#4300FF" "#009DFF" "#00FFB0" "#D9FF00" 📌Note: Function that gets a signal in the time domain as input, and outputs its decomposition into frequencies. Transform both the yaxis (frequency) to log scale, and the “color” axis (amplitude) to Decibels, which is approx. the log scale of amplitudes. In [26]: # Default FFT window size n_fft = 2048 # FFT window size hop_length = 512 # number audio of frames between STFT columns (looks like a # Short-time Fourier transform (STFT) D_amered = np.abs(librosa.stft(audio_amered, D_cangoo = np.abs(librosa.stft(audio_cangoo, D_haiwoo = np.abs(librosa.stft(audio_haiwoo, D_pingro = np.abs(librosa.stft(audio_pingro, D_vesspa = np.abs(librosa.stft(audio_vesspa, n_fft n_fft n_fft n_fft n_fft = = = = = n_fft, n_fft, n_fft, n_fft, n_fft, hop_length hop_length hop_length hop_length hop_length = = = = = hop hop hop hop hop In [27]: print('Shape of D object:', np.shape(D_amered)) Shape of D object: (1025, 222) #3. Spectrogram 📌Note: 🎷 What is a spectrogram? A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams (wiki). Here we convert the frequency axis to a logarithmic one. In [28]: # Convert DB_amered DB_cangoo DB_haiwoo DB_pingro DB_vesspa an amplitude spectrogram to Decibels-scaled spectrogram. = librosa.amplitude_to_db(D_amered, ref = np.max) = librosa.amplitude_to_db(D_cangoo, ref = np.max) = librosa.amplitude_to_db(D_haiwoo, ref = np.max) = librosa.amplitude_to_db(D_pingro, ref = np.max) = librosa.amplitude_to_db(D_vesspa, ref = np.max) # === PLOT === fig, ax = plt.subplots(2, 3, figsize=(16, 9)) fig.suptitle('Spectrogram', fontsize=16) fig.delaxes(ax[1, 2]) librosa.display.specshow(DB_amered, sr = y_axis = 'log', librosa.display.specshow(DB_cangoo, sr = y_axis = 'log', librosa.display.specshow(DB_haiwoo, sr = y_axis = 'log', librosa.display.specshow(DB_pingro, sr = y_axis = 'log', librosa.display.specshow(DB_vesspa, sr = sr_amered, hop_length = cmap = 'cool', ax=ax[0, sr_cangoo, hop_length = cmap = 'cool', ax=ax[0, sr_haiwoo, hop_length = cmap = 'cool', ax=ax[0, sr_pingro, hop_length = cmap = 'cool', ax=ax[1, sr_vesspa, hop_length = hop_length, 0]) hop_length, 1]) hop_length, 2]) hop_length, 0]) hop_length, y_axis = 'log', cmap = 'cool', ax=ax[1, 1]); for i, name in zip(range(0, 2*3), bird_sample_list): x = i // 3 y = i % 3 ax[x, y].set_title(name, fontsize=13) #4. Mel Spectrogram 🎷 📌Note: The Mel Scale, mathematically speaking, is the result of some non-linear transformation of the frequency scale. The Mel Spectrogram is a normal Spectrogram, but with a Mel Scale on the y axis. In [29]: # Create the Mel Spectrograms S_amered = librosa.feature.melspectrogram(y_amered, sr=sr_amered) S_DB_amered = librosa.amplitude_to_db(S_amered, ref=np.max) S_cangoo = librosa.feature.melspectrogram(y_cangoo, sr=sr_cangoo) S_DB_cangoo = librosa.amplitude_to_db(S_cangoo, ref=np.max) S_haiwoo = librosa.feature.melspectrogram(y_haiwoo, sr=sr_haiwoo) S_DB_haiwoo = librosa.amplitude_to_db(S_haiwoo, ref=np.max) S_pingro = librosa.feature.melspectrogram(y_pingro, sr=sr_pingro) S_DB_pingro = librosa.amplitude_to_db(S_pingro, ref=np.max) S_vesspa = librosa.feature.melspectrogram(y_vesspa, sr=sr_vesspa) S_DB_vesspa = librosa.amplitude_to_db(S_vesspa, ref=np.max) # === PLOT ==== fig, ax = plt.subplots(2, 3, figsize=(16, 9)) fig.suptitle('Mel Spectrogram', fontsize=16) fig.delaxes(ax[1, 2]) librosa.display.specshow(S_DB_amered, sr y_axis = 'log', librosa.display.specshow(S_DB_cangoo, sr y_axis = 'log', librosa.display.specshow(S_DB_haiwoo, sr y_axis = 'log', librosa.display.specshow(S_DB_pingro, sr y_axis = 'log', librosa.display.specshow(S_DB_vesspa, sr y_axis = 'log', = sr_amered, hop_length = hop_lengt cmap = 'rainbow', ax=ax[0, 0]) = sr_cangoo, hop_length = hop_lengt cmap = 'rainbow', ax=ax[0, 1]) = sr_haiwoo, hop_length = hop_lengt cmap = 'rainbow', ax=ax[0, 2]) = sr_pingro, hop_length = hop_lengt cmap = 'rainbow', ax=ax[1, 0]) = sr_vesspa, hop_length = hop_lengt cmap = 'rainbow', ax=ax[1, 1]); for i, name in zip(range(0, 2*3), bird_sample_list): x = i // 3 y = i % 3 ax[x, y].set_title(name, fontsize=13) #5. Zero Crossing Rate 🚷 📌Note: the rate at which the signal changes from positive to negative or back. In [30]: # Total zero_crossings in our 1 song zero_amered = librosa.zero_crossings(audio_amered, zero_cangoo = librosa.zero_crossings(audio_cangoo, zero_haiwoo = librosa.zero_crossings(audio_haiwoo, zero_pingro = librosa.zero_crossings(audio_pingro, pad=False) pad=False) pad=False) pad=False) zero_vesspa = librosa.zero_crossings(audio_vesspa, pad=False) zero_birds_list = [zero_amered, zero_cangoo, zero_haiwoo, zero_pingro, zero_ for bird, name in zip(zero_birds_list, bird_sample_list): print("{} change rate is {:,}".format(name, sum(bird))) amered cangoo haiwoo pingro vesspa change change change change change rate rate rate rate rate is is is is is 51,379 92,089 100,198 706,094 678,007 #6. Harmonics and Perceptrual 📌Note: 🎹 Harmonics are characteristichs that represent the sound color Perceptrual shock wave represents the sound rhythm and emotion In [31]: y_harm_haiwoo, y_perc_haiwoo = librosa.effects.hpss(audio_haiwoo) plt.figure(figsize = (16, 6)) plt.plot(y_perc_haiwoo, color = '#FFB100') plt.plot(y_harm_haiwoo, color = '#A300F9') plt.legend(("Perceptrual", "Harmonics")) plt.title("Harmonics and Perceptrual : Haiwoo Bird", fontsize=16); #7. Spectral Centroid 🎯 📌Note: Indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound. In [32]: # Calculate the Spectral Centroids spectral_centroids = librosa.feature.spectral_centroid(audio_cangoo, sr=sr)[ # Shape is a vector print('Centroids:', spectral_centroids, '\n') print('Shape of Spectral Centroids:', spectral_centroids.shape, '\n') # Computing the time variable for visualization frames = range(len(spectral_centroids)) # Converts frame counts to time (seconds) t = librosa.frames_to_time(frames) print('frames:', frames, '\n') print('t:', t) # Function that normalizes the Sound Data def normalize(x, axis=0): return sklearn.preprocessing.minmax_scale(x, axis=axis) Centroids: [- ..-] Shape of Spectral Centroids: (1629,) frames: range(0, 1629) t: [-e-e-e-02 ..-e-e-e+01] In [33]: #Plotting the Spectral Centroid along the waveform plt.figure(figsize = (16, 6)) librosa.display.waveplot(audio_cangoo, sr=sr, alpha=0.4, color = '#A300F9', plt.plot(t, normalize(spectral_centroids), color='#FFB100', lw=2) plt.legend(["Spectral Centroid", "Wave"]) plt.title("Spectral Centroid: Cangoo Bird", fontsize=16); #8. Chroma Frequencies 📌Note: Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chromas) of the musical octave. In [34]: # Increase or decrease hop_length to change how granular you want your data hop_length = 5000 # Chromogram Vesspa chromagram = librosa.feature.chroma_stft(audio_vesspa, sr=sr_vesspa, hop_len print('Chromogram Vesspa shape:', chromagram.shape) plt.figure(figsize=(16, 6)) librosa.display.specshow(chromagram, x_axis='time', y_axis='chroma', hop_len plt.title("Chromogram: Vesspa", fontsize=16); Chromogram Vesspa shape: (12, 309) 🎤 #9. Tempo BPM (beats per minute) 📌Note: Dynamic programming beat tracker. In [35]: # Create Tempo BPM variable tempo_amered, _ = librosa.beat.beat_track(y_amered, tempo_cangoo, _ = librosa.beat.beat_track(y_cangoo, tempo_haiwoo, _ = librosa.beat.beat_track(y_haiwoo, tempo_pingro, _ = librosa.beat.beat_track(y_pingro, tempo_vesspa, _ = librosa.beat.beat_track(y_vesspa, sr sr sr sr sr = = = = = sr_amered) sr_cangoo) sr_haiwoo) sr_pingro) sr_vesspa) data = pd.DataFrame({"Type": bird_sample_list , "BPM": [tempo_amered, tempo_cangoo, tempo_haiwoo, tempo # Image bird = mpimg.imread('../input/birdcall-recognition-data/violet bird.jpg') imagebox = OffsetImage(bird, zoom=0.34) xy = (0.5, 0.7) ab = AnnotationBbox(imagebox, xy, frameon=False, pad=1, xybox=(0.05, 158)) # Plot plt.figure(figsize = (16, 6)) ax = sns.barplot(y = data["BPM"], x = data["Type"], palette="hls") ax.add_artist(ab) plt.ylabel("BPM", fontsize=14) plt.yticks(fontsize=13) plt.xticks(fontsize=13) plt.xlabel("") plt.title("BPM for 5 Different Bird Species", fontsize=16); #10. Spectral Rolloff 🥏 📌Note: Is a measure of the shape of the signal. It represents the frequency below which a specified percentage of the total spectral energy (e.g. 85%) lies. In [36]: # Spectral RollOff Vector spectral_rolloff = librosa.feature.spectral_rolloff(audio_amered, sr=sr_amer # Computing the time variable for visualization frames = range(len(spectral_rolloff)) # Converts frame counts to time (seconds) t = librosa.frames_to_time(frames) # The plot plt.figure(figsize = (16, 6)) librosa.display.waveplot(audio_amered, sr=sr_amered, alpha=0.4, color = '#A3 plt.plot(t, normalize(spectral_rolloff), color='#FFB100', lw=3) plt.legend(["Spectral Rolloff", "Wave"]) plt.title("Spectral Rolloff: Amered Bird", fontsize=16); There are many more features that librosa can extract from sound. Check them all here. 4. Additional Data ➕➕➕ Why stop at 100? thread here @Vopani kindly scraped the remaining of the available bird audios from Xeno-Canto site. The data: doesn't contain already available train audios from competition data only MP3 format same license as the original data no corrupt audio present However, the Competition Hosts replied with: limiting to 100 audio files had the reason to not overload the memory only 100 top rated audios were used excluded videos that did not alow derivatives They also recommend the following: the upper limit is usually 500 recordings per species how many recordings are needed to train? (maybe even less than 100?) 📌Note: So, should you use the extended data? Should you stick only with original data? I have no clue, try multiple ideas and see what performs better In [37]: # Import the .csv files (corresponding with the extended data) train_extended_A_Z = pd.read_csv("../input/xeno-canto-bird-recordings-extend # Create base directory base_dir_A_M = "../input/xeno-canto-bird-recordings-extended-a-m" base_dir_N_Z = "../input/xeno-canto-bird-recordings-extended-n-z" # Create Full Path column to the audio files train_extended_A_Z['full_path'] = base_dir_A_M + train_extended_A_Z['ebird_c Sanity Check: are all the files the same length? In [38]: def count_files_dir(dir_name = "Default", pref = "Def"): birds_names = list(os.listdir(dir_name + "/" + pref)) total_len = 0 for bird in birds_names: total_len += len(os.listdir(dir_name +"/" + pref + "/" + bird)) return total_len In [39]: A_M = count_files_dir(base_dir_A_M, pref = "A-M") N_Z = count_files_dir(base_dir_N_Z, pref = "N-Z") print("There are {:,} birds in A-Z .csv file".format(len(train_extended_A_Z) "\n" + "and there are {:,} audio recs.".format(A_M + N_Z)) There are 23,041 birds in A-Z .csv file and there are 23,041 audio recs. Work in Progress ... ⏳ Just a Kind Reminder to always be Mindful and Better 😊 I would like to take this opportunity to end with a personal note. I joined this competition by having in mind nature and the well being of birds (and all life) on this beautiful planet. This competition suprizes 264 total unique beautiful bird species, but our kind Earth has more than 10,000 unique colorful birds that are singing out there at the moment. But we are losing species every year due to Global Warming, Polution, and whatever else you would like to name. So, although our input to this community is quite small and there is not much we can do, we can still be mindful with the effect we have every day to this process and try as much as possible to diminuish it. Some ideas are very simple, like: buying clothes when needed, not only for fashion sake using reusable water bottles drinking the morning coffee home and not on-the-go every day keeping the outdoors clean by keeping the garbage in a bag until you find a trash walking, biking, using public transport more than the car etc. Let's be mindful and better! Thank you for reading this rant. This notebook was converted to PDF with convert.ploomber.io