|
|
 |
 |
HOME
Where it all begins!
|
|
|
|
|
|
 |
|
INSIDETECHNOLOGY |
Friday 24
September 2010 |
 |
|
 |
Facebook
Re-Boots
To Fix
System
Error
If you
are a
Facebook
junkie,
yesterday
was a
stressful
day as
the
social
media
network
suffered
a world
wide
connection
problem,
where
users
could
not
connect,
nor
communicate
with
friends
as the
page
would
hang for
what
felt an
eternity
in
today's
high
speed
internet
access.
Facebook
blamed
Thursday's
2.5-hour
downtime
on a
change
it made
to its
system,
resulting
in the
worst
outage
the
social-networking
company
had seen
in four
years.
"The key
flaw
that
caused
this
outage
to be so
severe
was an
unfortunate
handling
of an
error
condition,"
Facebook's
Robert
Johnson
wrote in
a blog
post.
"This is
the
worst
outage
we've
had in
over
four
years,
and we
wanted
to first
of all
apologize."
Thursday's
outage
was the
second
in as
many
days for
Facebook,
which
was hit
was
sporadic
downtime
on
Wednesday
because
of an
"issue
with a
third-party
provider."
Facebook
has an
automated
system
that
checks
for
invalid
configuration
values
throughout
the
site. If
it finds
an
error,
it
replaces
it with
an
updated
value
from its
persistent
store.
"This
works
well for
a
transient
problem
with the
cache,
but it
doesn't
work
when the
persistent
store is
invalid,"
Johnson
wrote.
Unfortunately,
Facebook
made a
change
to its
persistent
store on
Thursday
that
ended up
being
invalid.
As a
result,
the
automated
system
checking
for
errors
would
replace
those
errors
with
values
from the
persistent
store -
which
was also
not
working.
"Because
the fix
involves
making a
query to
a
cluster
of
databases,
that
cluster
was
quickly
overwhelmed
by
hundreds
of
thousands
of
queries
a
second,"
Johnson
said.
"To make
matters
worse,
every
time a
client
got an
error
attempting
to query
one of
the
databases
it
interpreted
it as an
invalid
value,
and
deleted
the
corresponding
cache
key,"
Johnson
continued.
"This
meant
that
even
after
the
original
problem
had been
fixed,
the
stream
of
queries
continued."
The
result
was a
"feedback
loop"
that
didn't
allow
for
database
recovery
time, he
said.
How did
Facebook
fix it?
Re-booting,
essentially.
"We had
to stop
all
traffic
to this
database
cluster,
which
meant
turning
off the
site,"
Johnson
said.
For now,
the
system
that
corrects
configuration
values
has been
shut
down,
and
Facebook
is
"exploring
new
designs
for this
configuration
system
following
design
patterns
of other
systems
at
Facebook
that
deal
more
gracefully
with
feedback
loops
and
transient
spikes,"
he said.
All
users
should
now have
access,
Facebook
said. |
|
|
|
|
|
|
|
|
 |
 |
| |
Costa Rica's Daily English News
Source
Apdo. 2133-1000, San José, Costa
Rica
Tel: (506) 2231 3205 / (506) 8399
9642
Fax: (506) 2232 6337
|
|
|
|
|
|
|
|
Insidecostarica is an independent news media
portal featuring news of Costa Rica, Central
America, Latin America and other wonderful
and weir stuff. External links are
provided for reference purposes.
Insidecostarica.com is not responsible for
the content of the external sites.
If you need more information or to provide
recommendations, write to
editor@insidecostarica.com
|
|
|
|
|