When Our Information Becomes Unsafe

Andrew Fischer

 

The internet is a fantastic tool, and almost no one knows how it works.  Watching a movie on Netflix or buying something off Amazon has become intuitive and easy for us in the information age, but it’s a kind of magic where gestures on a keyboard result in B-list movies appearing on a computer screen and clicks from a mouse result in a new phone case and some toothpaste on your porch courtesy of a two day shipping policy.  Behind this magic is the hard work of the computer scientist, a strange creature mixed from one part Oompa Loompa and one part Wizard of Oz.  These people, who work behind the scenes, are the magicians behind the veil allowing the citizens of the internet to complain about how slow Netflix is and to remain blissfully unaware of the terrifying hazards that litter the virtual world.  Obvious threats like hackers, viruses, worms, and spyware are fought off by the foresight and effort of the computer scientist, but one of the most challenging threats the computer scientist will face in the coming years is the advent of the next stage of computing: the quantum computer.

In an effort to dispel the magic surrounding the internet, here is a simple caricature of how the internet works from the perspective of your computer sending an email.  Imagine being in elementary school, sitting in the back of the class.  You want to write a note and send it to your best friend, Billy Gates, but he is at the front of the class, paying attention to the teacher, who is ironically lecturing about how the internet works.  You don’t care about how the internet works, and you know that if future-you really cares about it, future-you will learn about it from some paper in some journal a couple years from now.  So you write your note, sign your name at the bottom, fold it up, and write Billy’s name on the exterior.  There is a note passing protocol that is well understood among you and your classmates: Who is this note for? Pass the note towards that person.  You pass the note to your neighbor, Betty, and after reading the name on the outside, Betty understands what you want her to do with it.  The note travels from person to person toward the front of the class, and once it reaches Billy, he opens your note.  Upon reading the words, “Who uses Bing anyway? Also, do you like Betty? Like, do you like like her?” he scowls and writes you a return note.  This time, your name is on the exterior, and following the Elementary School Note pAssing Protocol (ESNAP), it eventually makes its way back to you.  

This is essentially the internet.  It’s a bunch of computers connected to each other, passing messages between each other.  Each of them follows a strict protocol on how to handle messages.  These messages don’t have to be email either. Anything (pictures, video, bank account information, videogame data) is all treated the same way: write it down, pass it to the person next to you until it reaches its intended recipient.

But what happens if the note is passed to Bubba, the class bully? Or what about that “quiet kid” (no one actually knows his name) who Bubba bullies the most ? Or what about Terry? Terry has loved you forever, and thinks you have a thing for Billy.  Terry is looking for any opportunity to sabotage the non-existent romance you share with Billy. Or what even about Betty? She is pretty cool, but what if she read the note by accident? It is about her, after all. That would be pretty embarrassing.  Can we really trust the people in our class to pass the note without reading it, stealing it, or copying it?

The only computers on the internet you can trust are your own and the computer you are sending data to, but there are hundreds of servers between the two. How do you give Netflix your username and password, or Amazon your credit card information, without everyone else on the internet having access to it?  The answer is something called cryptography.  There are many types of cryptography, but for the purposes of this article, we will focus on a type called Asymmetric Cryptography, or Public/Private Key cryptography, specifically a variant called RSA (named for its creators Ron Rivest, Adi Shamir, and Leonard Adleman).

Public/Private Key cryptography relies on two keys, a public encryption key and a private decryption key.  The public key is given freely for anyone to use, while the private key is kept secret by the person who made the pair.  Anything encrypted with the public key can only be decrypted by the private key, allowing for secure one way communication.  In our classroom analogy, if someone wants to send Billy a secure message, Billy must first make a public/private key pair, and then pass out his public key to anyone who might want to communicate with him.  When you decide you have a suitable reason to write a note to Billy, you use his public key to encrypt your note, and once the note reaches his desk, Billy can decrypt it using his private key.  If at any point someone tries to read it, the note will be incomprehensible gibberish.  

So, problem solved.  Bubba can’t make fun of you, that quiet guy can’t get into your business, and Betty won’t accidentally read your note.  Now, we know Betty isn’t a threat, and Bubba only takes advantage of the low hanging fruit, but what about that “quiet kid”?  His dad makes Public/Private Key pairs for a living… can he somehow break this cryptography system we have built? What if he forges a key like Billy’s to read the note?  

Fortunately, the security of RSA is practically unbreakable.  RSA relies on a principle of mathematics, called factoring, to make it essentially impossible for “quiet guy” to forge his own private key.

For those who have been away from math for a while, we need to review some concepts before we can go into why RSA is secure. First let’s start with what factoring is.  Factoring is kind of the reverse of multiplication. When you multiply two numbers you get a result, called the product.  When you factor a number, you try to reverse that action by instead finding any two numbers that would result in the product when multiplied. The numbers 3 and 7 multiplied together make 21, so 3 and 7 are factors of 21.  However, 21 can also be made by multiplying 1 times 21.  Therefore, the complete list of factors for 21 are 1, 3, 7, and 21.  Some numbers have greater or fewer amounts of factors. For example, 60 has twelve factors (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, and 60) and 17 has just two factors (1 and 17).  

There is something special about 17 though; the only factors of 17 are 1 and 17.  Numbers like 2, 3, 5, 7, 11, 13, and 17 (there are infinitely more, so we will stop there) whose only factors are 1 and the number itself are called prime numbers.  Numbers like 21, who only have two factors, both of which are prime numbers (recall the factors of 21 are 1, 3, 7, and 21), are called semiprime numbers.  For a computer, the multiplication of two numbers into one product is easy, but factoring one number into two of its factors is very hard.  This is what makes RSA so secure.  

When making the Public/Private Key pair, the computer multiplies two really big prime numbers together, making a super big semiprime number.  This semiprime number is shared between the public and private keys, and is the thread that connects them.  The big semiprime number becomes a part of each key, but the factors used to make that large semiprime are thrown away once the keys are made.  You can make a direct copy of the keys, but if you only have the public key, there is no practical way to forge its private key pair. The only way to forge a private key is to factor this large semiprime number into its two pieces: the two prime numbers that were originally used to make it.

Computers are pretty smart, so why is this hard for them? Let’s consider two numbers of relatively similar size, 60 and 62.  Recall that 60 has lots of factors, 10 to be exact (we ignore 1 and 60 because they are obvious and not useful for forging a private key).  That means if we choose a random number between 1 and 60, we have about a 1/6th chance of that number being a factor.  Now consider 62; it is semiprime, and its factors (excluding 1 and 62) are 2 and 31.  We have a 1/30th chance that a random number we choose between 1 and 62 is a factor of 62.  Now, the prime numbers we use to make the Public/Private Key pair are hundreds of digits long, and the resulting semiprime numbers can be just over a thousand of digits long.  The chances of coming upon one of two factors while looking through 101233 numbers is essentially 0.  

Now, we’ve been using the word “practically impossible” and “essentially 0” because computers are able to factor these huge numbers.  The largest RSA semiprime that has been factored so far was 232 digits long, but it was done with state-of-the-art algorithms and took two years running on hundreds of computers.  Before that in 2009, a man named Benjamin Moody factored a 154 digit RSA semiprime in 73 days using his desktop computer.

Despite these impressive feats of computing, such efforts are easily countered.  All a company like Netflix has to do to keep such hard-working hackers from forging Netflix’s private key is make a new Public/Private Key pair once a month, and all those days of hard work are invalidated instantly. It’s not possible to factor the semiprime numbers fast enough. Every bank, online store, and social media password is safe because of Public/Private Key cryptography. What makes computer scientists afraid is the future of computing, and the impressive quantum computers that will make this elegant security system obsolete.  

Quantum computers are faster than conventional computers in a very specific way, and in order to discuss what makes them faster, we will need to understand some key aspects of how conventional computers work. The most important differences between the two is how they store data and how they solve problems, or in other words, run programs.

Conventional computers store data in long sequences of “bits” that store zeros (0) or ones (1).  A bit can be thought of as a digit of a number, but instead of the digit holding a number from 0 to 9, it can only hold a 1 or a 0 (binary digit).  These sequences can be read by the computer as a sort of code called binary, where sequences of ones and zeros can represent numbers, letters, pictures, or anything else the computer needs to store.  The way the computer reads these binary sequences can be compared to how an amature painter might paint by numbers.  If a painter sees a 3 on a part of his coloring book, he will know that 3 means to color this portion of the picture red, and if he sees a 7 he will know that means to color that portion blue.  In the same way, the binary sequence “0100 0001” to a computer can represent the number “65” or the letter “A”, or anything else it might need to store.  The computer knows what those binary codes represent from the context of the program it is currently running.  Continuing the paint by numbers analogy, different programs might be considered different coloring books. In his first coloring book, the amature painter knows 3 means red and 7 means blue, but in a different book by a different publisher, 3 might mean blue and 7 might mean green.  

A program is a set of instructions the computer follows to complete a task, typically to solve a problem of some sort.  Think about a program as a recipe for baking a chocolate cake; the problem being solved is that you don’t have a chocolate cake, and that you need one for this week’s potluck. A recipe for chocolate cake states how much flour, milk, eggs, and chocolate you need in order to make the cake, and then lists a sequence of instructions describing how to mix the ingredients, what temperature the oven should be, and how long the cake should bake before it’s ready to be taken out.  If you try to do one of the steps out of order and bake the ingredients before you mix them together, you might make a house fire instead of a cake.  And God forbid you try to do two steps at the same time.  Try to mix the ingredients while they are baking, and now you have a very messy, very on fire kitchen.  It’s important to do each step in the recipe in sequential order so the cake to comes out looking like a cake, and so that the fire department doesn’t need to be called.  The term for describing doing things in order is called doing things in sequence (good for cake making), and doing things all at the same time is called doing things in parallel (good for making messes).

The important information to take away from these analogies is that data in conventional computers is stored in binary (where a bit is either a 0 or a 1, and sequences of bits are used to store data), and that programs operate in sequence (do the first step, then the second step, then the third step, never out of order, and never more than one step at a time).

Before we go too deep into quantum computing, know that the rules of the quantum world are quite different from the rules of the world we live in.  Things in the quantum world are strange, and trying to relate aspects of the quantum world to similar aspects of our world often falls apart.  Where our world is built on undebatable truths (physically speaking), the quantum world is built on nebulous probabilities.  In order to understand all aspects of quantum computing, you need to understand quantum physics.  For our purposes, we need to understand a feature of the quantum world called superposition.  

Superposition is the idea that something can be two things at once.  A scientist named Shrödinger came up with a thought experiment involving a cat that explains this idea quite well.  Say there is a box perfectly sized to fit a cat.  Once you put a cat in this box, and put the lid on the top, some sinister magic takes hold of the cat, and there is a chance that the cat will be dead because of it.  The death will be silent, and there is no way for you to know if the cat is dead or alive until you open the box and look inside.  In the world of quantum physics, the cat is in a state of superposition; until you open the box, and collapse the state of superposition, the cat is both alive and dead at the same time.  This is key to understanding how the quantum computing version of a bit, called a qbit, works.  In our world, a cat can only be alive or dead, and a bit can only be a 0 or a 1, but in the quantum world, the cat can be alive, dead, or alive and dead, and the qbit can be 0, 1, or both 0 and 1 at the same time.  

Why is this useful? Other features of quantum physics can be used to influence the phenomenon of superposition, and the information stored in qbits can be operated on in parallel. Because some of the qbits in a sequence of qbits are in superposition, while others are specifically 0 or 1, they contain more information than they could if they were conventional bits.  This extra information allows for special mathematical operations (think along the lines of addition and multiplication) to be done simultaneously, rather than having to do them in sequential order.  Instead of having to follow each step in the chocolate cake recipe in sequential order like a conventional computer, a quantum computer can save time by mixing the ingredients, preheating the oven, and baking the cake all at the same time! And instead of a house fire, you get the perfect chocolate cake.

This is what makes quantum computing so interesting to physicists and computer scientists, and why it’s a threat to the way we secure our internet communications.  In 1994, a man named Peter Shor developed an algorithm, called Shor’s algorithm, that leverages the features of quantum computing to make factoring easy for computers.  This is not just a theoretical algorithm.  In 2012, the number 21 was factored using Shor’s algorithm on a quantum computer.  And yes, as that implies, quantum computers do exist.  Currently, the largest number factored by a quantum computer is 56153.  Although this is impressive, it currently does not threaten the security fabric of the internet because of how very large the semiprimes used in Public/Private Key cryptography are (recall they can be around a thousand digits long). But progress marches on, and within the next few decades, quantum computing will grow from it’s infancy into a useable technology.  Though other types of cryptography are not threatened by quantum computing, Public/Private Key cryptography is an integral part of how the internet works. It’s how you start almost any conversation you have with any website on the internet, and without it, something as simple as typing in your username and password will be compromised.  

Magic weaving computer scientists across the world can see the threat quantum computing poses to the security of the internet. Large tech companies, like Google, and international governments, like the EU, USA, Russia, and China, are investing great sums of money and an impressive number of man hours into the field of post-quantum cryptography (cryptography designed specifically to foil quantum computers), hoping to find ways to keep information secure from the impressive power and speed of quantum computers.  If they should fail to find suitable cryptographic solutions to replace Public/Private Key cryptography before the advent of quantum computing, the confidence you have typing your username and password into Netflix could be a thing of the past.

Should these hard working computer scientists fail to find a replacement in time, your notes to Billy, which seem so secure, will be easily read by anyone with access to a quantum computer.  This is one of the great fears of the computer science community, and now that the magic behind the internet has been dispelled, the inevitable death of Public/Private Key cryptography should be a sobering thought.

 

 

Provenance: Submission.

 

Andrew Fischer graduated in 2014 with a Bachelors of Science in Computer Science with focuses in Software Engineering and Human Centered Computing and a minor in Applied Mathematics from the School of Engineering at the University of Colorado at Boulder.  He has worked on a variety of projects, including building Javascript compilers and interpreters, modeling ant colony activity and interactions, designing and coding visual servoing systems for Rethink Robotics Baxter robots, and automating databank construction for material sciences data. He has worked as a software engineer in the clean energy and engineering/aerospace industries for American Energy Assets and MSC Software respectively, and hopes to transition into the technical writing profession in the future, this being the first published piece in his portfolio.  

Campus-classroom” by Adityamanutd is licensed under CC BY-SA 3.0.

No Comments

Leave a Reply