How
to
Share
Personal
Data
While
Keeping
Secrets
Safe
A
new
technique
could
help
companies
like
Facebook
make
money
from
your
data
without
putting
it
at
risk.
Giant
stockpiles
of
personal
data,
whether
Web
browsing
logs,
credit-card
purchases,
or
the
information
shared
through
social
networks,
are
becoming
increasingly
valuable
assets
for
businesses.
Such
data
can
be
analyzed
to
determine
trends
that
guide
business
strategy,
or
sold
to
other
businesses
for
a
tidy
profit.
But
as
your
personal
data
is
analyzed
and
handed
around,
the
risk
increases
that
it
could
be
traced
back
to
you,
presenting
an
unwelcome
invasion
of
privacy.
A
new
mathematical
technique
developed
at
Cornell
University
could
offer
a
way
for
large
data
sets
of
personal
data
to
be
shared
and
analyzed
while
guaranteeing
that
no
individual's
privacy
will
be
compromised.
"We
want
to
make
it
possible
for
Facebook
or
the
U.S.
Census
Bureau
to
analyze
sensitive
data
without
leaking
information
about
individuals,"
says
Michael
Hay,
an
assistant
professor
at
Colgate
University,
who
created
the
technique
while
a
research
fellow
at
Cornell,
with
colleagues
Johannes
Gehrke,
Edward
Lui,
and
Rafael
Pass.
"We
also
have
this
other
goal
of
utility;
we
want
the
analyst
to
learn
something."
Advertisement
Companies
often
do
attempt
to
mitigate
the
risk
that
the
personal
data
they
hold
could
be
used
to
identify
individuals,
but
these
measures
aren't
always
effective.
Both
Netflix
and
AOL
discovered
this
when
they
released
supposedly
"anonymized"
data
so
that
anyone
could
analyze
it.
Researchers
showed
that
both
data
sets
could
be
de-anonymized
by
cross
referencing
them
with
data
available
elsewhere.
"In
practice,
people
are
using
fairly
ad-hoc
techniques"
to
protect
the
privacy
of
users
included
in
these
data
sets,
says
Hay.
These
techniques
include
stripping
out
names
and
social
security
numbers,
or
other
data
points.
"People
have
crossed
their
fingers
that
they
are
providing
true
protection,"
says
Hay,
who
adds
that
data
mavens
at
some
government
agencies
fear
lawsuits
could
be
filed
over
improperly
protecting
data
for
privacy.
"I
know
in
talking
with
other
people
at
statistical
agencies
where
they
said
we're
worried
about
being
sued
for
privacy
violations."
In
recent
years,
many
researchers
have
worked
on
ways
to
mathematically
guarantee
privacy.
However,
the
most
promising
approach,
known
as
differential
privacy,
has
proven
challenging
to
implement,
and
it
typically
requires
adding
noise
to a
data
set,
which
makes
that
data
set
less
useful.
The
Cornell
group
proposes
an
alternative
approach
called
crowd-blending
privacy.
It
involves
limiting
how
a
data
set
can
be
analyzed
to
ensure
that
any
individual
record
is
indistinguishable
from
a
sizeable
crowd
of
other
records—and
removing
a
record
from
the
analysis
if
this
cannot
be
guaranteed.
Noise
does
not
need
to
be
added
to a
data
set,
and
when
a
data
set
is
sufficiently
large,
the
group
showed
that
crowd-blending
comes
close
to
matching
the
statistical
strength
of
differential
privacy.
"The
hope
is
that
because
crowd-blending
is a
less
strict
privacy
standard
it
will
be
possible
to
write
algorithms
that
will
satisfy
it,"
says
Hay,
"and
it
could
open
up
new
uses
for
data."
The
new
technique
"provides
an
interesting
and
potentially
very
useful
alternative
privacy
definition,"
says
Elaine
Shi,
an
assistant
professor
at
the
University
of
Maryland,
College
Park,
who
is
also
researching
ways
to
protect
privacy
in
data
sets.
"In
comparison
with
differential
privacy,
crowd-blending
privacy
can
sometimes
allow
one
to
achieve
much
better
utility,
by
introducing
less
or
no
noise."
Shi
adds
that
research
into
guaranteeing
privacy
should
eventually
make
it
possible
to
take
responsibility
for
protecting
users'
data
out
of
the
hands
of
software
developers
and
their
managers.
"The
underlying
system
architecture
itself
[would]
enforce
privacy—even
when
code
supplied
by
the
application
developers
may
be
untrusted,"
she
says.
Shi's
research
group
is
working
on a
cloud-computing
system
along
those
lines.
It
hosts
sensitive
personal
data
and
allows
access,
but
also
carefully
monitors
the
software
that
makes
use
of
it.
Benjamin
Fung,
an
associate
professor
at
Concordia
University,
says
crowd-blending
is a
useful
idea,
but
believes
that
the
differential
privacy
may
still
prove
feasible.
His
group
worked
with
a
Montreal
transportation
company
to
implement
a
version
of
differential
privacy
for
a
data
set
of
geolocation
traces.
Fung
suggests
that
research
in
this
area
needs
to
move
on
to
implementation,
so
crowd-blending
and
other
approaches
can
be
directly
compared—and
eventually
put
into
practice.
Hay
agrees
that
it's
time
for
the
discussion
to
move
on
to
implementation.
But
he
also
points
out
that
privacy
protections
won't
prevent
other
practices
that
some
people
may
find
distasteful.
"You
can
satisfy
constraints
like
this
and
still
learn
predictive
correlations,"
he
points
out,
which
might
result,
for
example,
in
auto
insurance
premiums
being
set
based
on
information
about
a
person
seemingly
unrelated
to
their
driving.
"As
privacy
guaranteeing
techniques
are
adopted,
it
could
be
that
other
concerns
emerge."
Source:
http://www.technologyreview.com