Japan is no stranger to earthquakes. The island itself is located at the converging point of four techtonic plates. As a result, the area experiences around 1,000 earthquakes a year, averaging about 3 a day. Most of these quakes go without any notice, the majority being below magnitude 5, but there is always the chance for much bigger quakes to occur.

Interactive Map of Earthquakes near Japan. Made with Folium

A magnitude 9.1 earthquake is an incredibly rare event in any part of the world and, according to the official position of the US Geological Survey, it is impossible to predict both time and location of such an event before it happens. This doesn't mean the probability of such an event can't be calculated though. The Gutenberg-Richter law states that the relationship between the magnitude of an earthquake and the logarithm of the probability that it happens is linear, so we can estimate the likeihood of such an event happening by creating a linear model

The Fukushima Daiichi Nuclear Accident

I've read online that during the safety analysis, the engineers at Fukushima used historical earthquake data dating back to 1600 CE to build a regression model to determine the likelihood of significant earthquakes. Instead of using the accepted Gutenberg-Richter model, they saw a kink in the data and assumed the appropriate regression was not linear but polynomial. The resulting model assumed that a magnitude 9 and above earthquake was essentailly impossible, occuring once within a 19,000 year period.

I haven't found much evidence to support this other than a few sources such as Berkeley Machine Learning Crash Course which cites a paper by Brian Stacey titled Fukushima: The Failure of Predictive Models which then cites Nate Silver's book, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't and a report by M.Ragheb titled FUKUSHIMA EARTHQUAKE AND TSUNAMI STATION BLACKOUT ACCIDENT

What I can tell is that the station was not designed to widthstand a magnitude 9 earthquake. The 3/11 earthquakes produced a tsunami towering more than 14 meters (46 feet). The safety design team desgined the plan to be able to widthstand a magnitude 8.6, which would have generated a tsunami 5.7 meters tall, so the tsunami was easily able to breach the 6 meter tall seawall surrounding the plant.

The assuption that these types of quakes were impossible wasn't uncommon. Emile Okal, a earthquake and tsunami expert explains some of the rational.

Essentially, the feeling was that — if you had a very, very old ocean floor — when it was eventually recycled into the mantle of the Earth, you wouldn’t get these mega-earthquakes.

In his book, Nate Silver uses data from January 1st, 1964 to March 11th, 2011 to recreate a similar polynomial model that the safety engineers have assumed to have used. Nate uses data from the US Geological Survey and pulls a 1 degree by 1 degree box of data around the origin of the 3/11 earthquake. We can use the USGS API to do the same

min_lat, max_lat = 38.32 - 0.5, 38.32 + 0.5
min_long, max_long = 142.37 - 0.5, 142.37 + 0.5
start_time = datetime.datetime(1964,1,1)
end_time = datetime.datetime(2011,3,11)
url = f"""\
https://earthquake.usgs.gov/fdsnws/event/1/\
query.csv?\
starttime={start_time.strftime("%Y-%m-%d")}&\
endtime={end_time.strftime("%Y-%m-%d")}&\
maxlatitude={max_lat}&\
minlatitude={min_lat}&\
maxlongitude={max_long}&\
minlongitude={min_long}&\
minmagnitude=4.5&\
eventtype=earthquake&\
orderby=magnitude&\
limit=20000"""

df = pd.read_csv(url)
df.describe().T

This data pull only includes only 149 earthquakes, which doesn't seem robust enough for a proper analysis. Instead, I'll use this data source from Kaggle and filter by the same criteria.

We will want to convert the data to a frequency per year basis, so some data manipulation will be required. We can groupby mag and generate a count for the type of event.

As a fun exercise, I recreated the style that Nate Silver used in his book. Below is the code to generate the scatter plot.

fig, ax = plt.subplots(figsize=(16,9))
df_freq.reset_index().plot.scatter(x = 'Magnitude', 
                                   y ='Yearly Frequency', 
                                   logy=True, 
                                   ax = ax, 
                                   color = 'k', 
                                   s = 250, 
                                   marker = 'D', 
                                   clip_on=False)


plt.xticks(np.arange(4.5,9.6,0.5), size = 20)
plt.yticks(size=20)
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y)))
ax.tick_params(width = 1, length = 10, which = 'major')
ax.tick_params(width = 0, length = 0, which = 'minor')
ax.tick_params(width = 1, length = 10, which = 'major')
ax.tick_params(width = 0, length = 0, which = 'major', axis = "x")

plt.hlines(1,0,10, color = 'k')

plt.xlim(4.5,9.5)
plt.ylim(0.0001,1000)

plt.grid(linestyle="--", linewidth = 2.5, alpha = 0.3)


plt.ylabel("Annual Frequency:\nEarthquakes of at least this magnitude", size = 20)
plt.xlabel("Magnitude",size = 20)
[ax.spines[i].set_visible(False) for i in ['top','bottom','right']]
plt.suptitle(f"{'TŌHOKU, JAPAN EARTHQUAKE FREQUENCIES'}\n{'JANUARY 1, 2001-MARCH 10,2011':<45}", 
             size = 20, 
             x = 0.2, 
             y = 1);

The chart doesn't exactly match Nate's due to the dataset being a shorter time time period (2000-2011) but it's a close enough match to create a linear model.

lm = LinearRegression()
X = np.array(df_freq.reset_index()['Magnitude']).reshape(-1,1)
y = np.log10(df_freq['Yearly Frequency'])
s = df_freq['Yearly Frequency']
lm.fit(X,y,s)
pred = 10 ** lm.predict(np.arange(0,11,1).reshape(-1,1))

The resulting model is a good approximation to Nate's with the estimated likihood of a magnitude earthquake happening once every 800 years while Nates's linear model predicts an event every 900 years.

Polynomial

Since I can't find too much info on the polynomial model, but the closest, least complicated model that can match Nate's polynomial model is
b4 mag^4 + b3 mag^3 + b2 mag^2 + b1 mag + b0. This model produces the following graph.

	count	mean	std	min	25%	50%	75%	max
latitude	149.0	38.363785	0.281879	37.822	38.149	38.387	38.578	38.819
longitude	149.0	142.335181	0.304166	141.884	142.075	142.273	142.623	142.842
depth	149.0	43.355034	17.329300	5.000	33.000	42.000	51.400	102.800
mag	149.0	5.109060	0.648253	4.500	4.600	4.900	5.300	7.700
nst	39.0	181.948718	171.303048	9.000	40.000	130.000	296.000	698.000
gap	35.0	96.257143	44.071208	17.400	64.600	105.000	129.400	212.900
dmin	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
rms	93.0	0.953978	0.231602	0.500	0.800	0.930	1.100	1.600
horizontalError	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
depthError	45.0	7.373333	4.072212	2.000	5.100	5.900	7.600	20.600
magError	8.0	0.346250	0.155466	0.200	0.230	0.275	0.485	0.580
magNst	61.0	17.131148	32.103725	1.000	3.000	6.000	20.000	225.000

	count	mean	std	min	25%	50%	75%	max
latitude	5483.0	38.331719	8.039854	23.667	30.1765	39.298	46.3065	50.789
longitude	5483.0	144.591140	7.748972	124.293	141.0780	143.441	152.6375	158.670
depth	5483.0	59.023582	92.162826	0.000	10.0000	33.000	54.5500	631.900
mag	5483.0	4.849389	0.410098	4.500	4.6000	4.700	5.0000	8.300
nst	5482.0	116.054360	123.964728	5.000	32.0000	70.000	151.0000	929.000
gap	4702.0	106.654126	37.773677	9.900	82.6250	113.000	133.6000	306.600
dmin	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
rms	5405.0	0.904420	0.191105	0.120	0.7800	0.880	1.0100	1.870
horizontalError	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
depthError	1808.0	10.173507	7.412728	0.000	5.5000	8.100	12.8000	70.700
magError	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
magNst	3501.0	25.744073	30.298433	1.000	5.0000	15.000	36.0000	274.000

	Yearly Frequency
Magnitude
4.5	114.054666
4.6	93.817577
4.7	77.117066
4.8	63.854895
4.9	46.663193
5.0	33.499260
5.1	24.952528
5.2	16.405796
5.3	14.342792
5.4	9.922068
5.5	8.939685
5.6	6.287251
5.7	6.581966
5.8	3.340102
5.9	4.420724
6.0	3.241864
6.1	2.455958
6.2	1.277098
6.3	1.277098
6.4	1.080621
6.5	0.884145
6.6	0.785906
6.7	0.392953
6.8	0.785906
6.9	0.392953
7.0	0.589430
7.1	0.098238
7.2	0.196477
7.3	0.294715
7.4	0.392953