Investigation of Bank Failures

Data provided from Kaggle: https://www.kaggle.com/fdic/bank-failures

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("bank-failures/banks.csv")
df.head()
df.count()

Financial Institution Number    2883
Institution Name                3484
Institution Type                3484
Charter Type                    3484
Headquarters                    3484
Failure Date                    3484
Insurance Fund                  3484
Certificate Number              2999
Transaction Type                3484
Total Deposits                  3484
Total Assets                    3333
Estimated Loss (2015)           2509
dtype: int64

Data Cleaning

df = df.join( df["Headquarters"].str.split(",", expand = True))
df["D/A"] = df["Total Deposits"]/df["Total Assets"]
df["Failure Date"] = pd.to_datetime(df["Failure Date"])
df = df.rename(columns = {0:"City", 1:"State", 2:"Del"})
df = df.drop("Del", 1)
df.count()

Financial Institution Number    2883
Institution Name                3484
Institution Type                3484
Charter Type                    3484
Headquarters                    3484
Failure Date                    3484
Insurance Fund                  3484
Certificate Number              2999
Transaction Type                3484
Total Deposits                  3484
Total Assets                    3333
Estimated Loss (2015)           2509
City                            3484
State                           3484
D/A                             3333
dtype: int64

Time Plots

Not unexpectedly, we see the largest peaks in the lates 80’s and early 90’s corresponding to the Savings and Loan crisis, 2009/2010 corresponding to the GFC and the late 30’s with The Great Depression. What would be more interesting would be to look at the percentage failures of banks (to do.)

fails_by_year = df['Failure Date'].groupby([df["Failure Date"].dt.year]).agg('count')
plt.figure(figsize=(15,10))
sns.barplot(fails_by_year.index, fails_by_year)
plt.xticks(rotation="vertical")
plt.ylabel("Bank Failures")
plt.show()

png

Geographical Plots

df.groupby("State").count()["Failure Date"].sort_values(ascending=False)[0:25].plot(kind="bar")
plt.ylabel("Total Failures")
plt.show()

png

We see that TX has both the largest number of failures and the largest total estimated losses. Let’s see whose contributing to this.

df.groupby("State").sum()["Estimated Loss (2015)"].sort_values(ascending=False)[0:25].plot(kind="bar")
plt.ylabel("Total Estimated Losses")
plt.show()

png

df[df["State"] == " TX"].sort_values(by = "Estimated Loss (2015)", ascending = False)[0:10]

	Financial Institution Number	Institution Name	Institution Type	Charter Type	Headquarters	Failure Date	Insurance Fund	Certificate Number	Transaction Type	Total Deposits	Total Assets	Estimated Loss (2015)	City	State	D/A
1520	6938.0	UNIVERSITY FEDERAL SAVINGS	SAVINGS ASSOCIATION	FEDERAL/STATE	HOUSTON, TX	1989-02-14	RTC	30685.0	ACQUISITION	3776427	4480389.0	2177985.0	HOUSTON	TX	0.842879
1377	2846.0	FIRST REPUBLICBANK-DALLAS, N.A.	COMMERCIAL BANK	FEDERAL	DALLAS, TX	1988-07-29	FDIC	3165.0	ACQUISITION	7680063	17085655.0	2017459.0	DALLAS	TX	0.449504
2363	2124.0	SAN JACINTO SAVINGS	SAVINGS ASSOCIATION	FEDERAL/STATE	HOUSTON, TX	1990-11-30	RTC	31058.0	ACQUISITION	2894745	2869629.0	1700654.0	HOUSTON	TX	1.008752
1506	7070.0	GILL SA	SAVINGS ASSOCIATION	FEDERAL/STATE	SAN ANTONIO, TX	1989-02-07	RTC	31503.0	ACQUISITION	1448432	1207294.0	1659803.0	SAN ANTONIO	TX	1.199734
1618	7335.0	COMMONWEALTH SAVINGS ASSOC.	SAVINGS ASSOCIATION	FEDERAL/STATE	HOUSTON, TX	1989-03-09	RTC	31896.0	TRANSFER	1608452	1647893.0	1613353.0	HOUSTON	TX	0.976066
1714	2985.0	MBANK DALLAS, NATIONAL ASSOCIATION	COMMERCIAL BANK	FEDERAL	DALLAS, TX	1989-03-28	FDIC	3163.0	ACQUISITION	4033803	6556056.0	1610809.0	DALLAS	TX	0.615279
1517	6952.0	BRIGHT BANC	SAVINGS ASSOCIATION	FEDERAL/STATE	DALLAS, TX	1989-02-10	RTC	31095.0	ACQUISITION	3004443	4388466.0	1307798.0	DALLAS	TX	0.684623
1823	2100.0	VICTORIA SA	SAVINGS ASSOCIATION	FEDERAL/STATE	SAN ANTONIO, TX	1989-06-29	RTC	29378.0	PAYOUT	855717	882849.0	968972.0	SAN ANTONIO	TX	0.969268
1627	2104.0	BANCPLUS SAVINGS ASSOCIATION	SAVINGS ASSOCIATION	FEDERAL/STATE	PASADENA, TX	1989-03-09	RTC	31128.0	ACQUISITION	923026	751461.0	964160.0	PASADENA	TX	1.228309
1619	7429.0	BENJAMIN FRANKLIN SA	SAVINGS ASSOCIATION	FEDERAL/STATE	HOUSTON, TX	1989-03-09	RTC	30761.0	ACQUISITION	2004722	2641392.0	882240.0	HOUSTON	TX	0.758964

df.groupby("City").count()["Failure Date"].sort_values(ascending=False)[0:25].plot(kind="bar")
plt.ylabel("Total Failures")
plt.show()

png

df.groupby("State").mean()["D/A"].sort_values(ascending=False)[0:50].plot(kind="bar")
plt.ylabel("Mean Deposit to Asset Ratio")
plt.show()

png

Deposit to Assets

Ignoring the possibility that this is not real and hence not inflation adjusted. Interesting to see whether there is any relationship between D/A and the losses suffered. It looks like there is no strong correlation at all, which would indicate that it’s not necessarily a mismanagement of deposits/assets which have driven bank failures, but other factors. Typically it’s liquidity which brings the banks down, after sustaining a systemic asset shock (housing/mortgage exposures in the GFC).

plt.scatter(x = df["D/A"], y = df["Estimated Loss (2015)"])
plt.xlabel("Deposits to Asset Ratio")
plt.ylabel("Estimated Loss")
plt.show()

png

plt.plot_date(x = df["Failure Date"], y = df["D/A"])
plt.ylabel("Deposits to Asset Ratio")
plt.show()

png