Personal Computers, especially those for game playing, have become powerful enough to render the surface of reality. "Rendering" is the process of mathematically converting an abstract world model into images projected onto the computer screen. It requires supercomputer capabilities, which means that those graphical processing units (GPUs) in your graphics card are really a supercomputer in every sense of the word. (Not exactly general purpose, but optimized to perform rendering operations.)
Many people assume that because the personal computer can render the surface of reality, it should also be able to do what our brains do when playing a game or just taking a walk in the park: recognize objects and react to those objects. This is not true. Image processing is a much more intractable problem than rendering. Computers still can not "see" the way we do. At best, the image processing that computers can do now can not even compare to that performed by birds. Birds can perceive the surface of reality, and even navigate in it solely based on its perception of reality. Computers still can not process a series of frames in real time and react to objects the way a bird can.
There are some webcams currently in the market that are supposed to be able to track human faces. The idea is for you to be able to move around the room while being live in the internet. The camera should track your face. Even this simple task is not easy for the computer to do, as evidenced by the face-tracking webcam I just bought: it fails to track my face even if, while being seated, I simply lean on one side and then the other.
Driving your car in a busy street is a very, very complex operation. You have to be aware of what’s going on around you, and be able to react in a split second. Computers can now drive SUVs on a mountain road (with a lot of help from GPS satellites), but still are not qualified to obtain a driver’s license.