2. Reliability is getting important
• More use of the Internet
• COVID-19 has been pushing digitalization
• Bandwidth is a key
• When congestion occurs, the experience gets worse
• But enough bandwidth just is not enough
• Even if you have it set up wrong, you can still use it somehow
• Reasonability, stability and resiliency is the other key
bdNOG12 maz@iij.ad.jp 2
3. Risk prediction training
1. Understanding the situation
• Discuss imaginable hazard scenario in the given situation.
2. Determining risks
• Identify the hazards that need to be addressed
3. Establishing countermeasures
• Discuss possible measures to solve the hazards
4. Setting goals
• Selecting possible measures to implement
bdNOG12 maz@iij.ad.jp 3
4. Example1: Routing
• An ISP assigns /24 for a customer
• ISP set up a static route for the link
• The customer set up a default route
to the uplink
• The customer uses /28 out of the
/24
10.0.0.0/24
10.0.0.0/28
static route
static route
default
bdNOG12 maz@iij.ad.jp 4
5. Example1: Risks
• If a packet comes to an address
other than the /28 out of the /24,
the packet will be looped
• If the customer's LAN-side interface
is down, all packets destined for the
/24 will be looped.
• Routing loop!
10.0.0.0/24
10.0.0.0/28
static route
static route
default
A packet
to: 10.0.0.99
bdNOG12 maz@iij.ad.jp 5
6. Example1: Measures
• Implementing dynamic routing
between ISP and the customer
• Configuring a static route on the
customer's router that directs the
same /24 to null
10.0.0.0/24
10.0.0.0/28
static route
static route
default
bdNOG12 maz@iij.ad.jp 6
7. Example1: Adopting
• Configuring a static route on the
customer's side router that directs
the same /24 to null
10.0.0.0/24
10.0.0.0/28
static route
static route
default
10.0.0.0/24
static null route
bdNOG12 maz@iij.ad.jp 7
8. Example2: Port assignments
• Removing a cable from port X
• Just to be safe, make sure the LED is off before pulling it out
• But can you spot the right port for sure?
bdNOG12 maz@iij.ad.jp 8
9. 1 2 3
4 5 6
Straight forward
Starting from port 0The left LED is for LC status
More efficient but confusable A little clearer
Port 21 is the SFP now
bdNOG12 maz@iij.ad.jp 9
10. And more...
• We may see a different implementation in the future
• Assumptions are the source of accidents!
• Different products have different
port/LED assignments
• These caused confusion
bdNOG12 maz@iij.ad.jp 10
11. The more you know, the more you can see
• A variety of experience helps us to better consider the
hazards
• and to identify risks
• Technical education and proper training are necessary to
improve operational skills
• bdNOG workshops and tutorials are helpful
• There is always a need for appropriate educational
materials
bdNOG12 maz@iij.ad.jp 11
12. Mistakes!
• Mistakes can be a very good teaching tool
• There is a lot to learn from mistakes in the case studies
• There are some special cases, but there are also many common
failures and lessons to be learned by comparing them to your
own situation
• But as a business, we need to stop repeating failures in
our service facilities
• It damages reliability
bdNOG12 maz@iij.ad.jp 12
13. Build a database of mistakes
• It can be a great teaching tool for engineers!
• not to reproduce the similar mistakes
• You may find common and frequent mistakes
• If you can find the root cause of the failure, you can come up
with a more effective solution
bdNOG12 maz@iij.ad.jp 13
14. Mistake trend analysis
• Identify the high-impact mistakes
• Minimize the bad effects
• Reduce mistakes
bdNOG12 maz@iij.ad.jp 14
effects of mistakes
frequency
of mistakes
should not be
happened
problemsmatters
problems
15. Accident investigation committee
• In some industries, Accident Investigation Committees
conduct detailed investigations and compile reports in
order to prevent the repeating of serious accidents
• Maybe bdNOG can do this as a community activity
• For the healthy development of the Internet in Bangladesh
• Regular reports of accident cases during bdNOG meetings
bdNOG12 maz@iij.ad.jp 15
16. Summary
• To have a reliable network, we need to continuously
improve our operations
• The use of failure cases allows for more effective risk
analysis and countermeasures
• As bdNOG community, I believe the following are worth
considering
• Collection of failure and mistake cases
• Trials of accident analysis
bdNOG12 maz@iij.ad.jp 16