IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy.
Satellite images or abstract maps help ViKiNG make its way in the world
No humans were harmed by the ViKiNG robot.
The way most robots navigate is very different from the way most humans navigate. Robots are happiest when they have total environmental understanding, with some sort of full geometric reconstruction of everything around them plus exact knowledge of their own position and orientation. Lidars, preexisting maps, powerful computers, and even a motion-capture system if you can afford it—the demands of autonomous robots never end.
Obviously, this stuff doesn’t scale all that well. With that in mind, Dhruv Shah and professor Sergey Levine at the University of California, Berkeley, are working on a different approach. Their take on robotic navigation does away with the high-end, power-hungry components. What suffices for their navigation technique are a monocular camera, some neural networks, a basic GPS system, and some simple hints in the form of a very basic human-readable overhead map. Such hints may not sound all that impactful, but they enable a very simple robot to efficiently and intelligently travel through unfamiliar environments to reach far-off destinations.
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints (Summary Video)www.youtube.com
If that little robot looks familiar, that’s because we met it a couple of years ago through Greg Khan, a student of Levine’s. Back then, the robot was named BADGR, and its special skill was learning to navigate through novel environments based on simple images and lived experience—or whatever the robot equivalent of lived experience is. BADGR has now evolved into ViKiNG, which stands for “Vision-Based Kilometer-Scale Navigation with Geographic Hints,” which is a slightly less forgivable acronym. While BADGR was perfectly happy to wander around small areas, its successor is intended to traverse long distances in search of a goal, which is an important step toward practical applications.
Navigation, very broadly, consists of understanding where you are, where you want to go, and how you want to get there. For robots, this is the equivalent of a long-term goal. Some far-off GPS coordinate can be reached by achieving a series of short-term goals, like staying on a particular path for the next couple of meters. Achieve enough short-term goals and you reach your long-term goal. But there’s a sort of medium-term goal in the mix too, which is especially tricky, because it involves making more complex and abstract decisions about what the “best” path might be. Or, in other words, which combination of short-term goals best serves the mission to reach the long-term goal.
This is where the hints come in for ViKiNG. Using either a satellite map or a road map, the robot can make more-informed choices about what short-term goals to aim for, vastly increasing the likelihood that it’ll achieve its objectives. Even with a road map, ViKiNG is not restricted to roads; it just may favor roads because that’s the information it has. Satellite images, which include roads but also other terrain, give the robot more information to work with. The maps are hints, not instructions, which means that ViKiNG can adapt to obstacles it wasn’t expecting. Of course, maps can’t tell the robot exactly where to go at smaller scales (whether those short-term goals are traversable or not), but ViKiNG can handle that by itself with just its monocular camera.
The performance of ViKiNG is impressive; as you can see in the figure above, the blue line shows that ViKiNG takes a sensible, efficient route to its goal. Remember, it doesn’t have a comprehensive environmental map. It gets the job done with a very basic GPS; a picture and the general GPS coordinates of its goal; a monocular camera; and the map. This figure shows a robot traversing a short route; but ViKiNG can navigate autonomously without any problems until the researchers get tired of following it—distances as long as several kilometers.
“I think this is exciting because the entire method is very simple,” says UC Berkeley’s Levine. “In contrast to autonomous-driving systems that use enormous software stacks with many interacting components, this system uses two neural networks (one to process first-person images, and one to process the map images) and a planning algorithm that uses them to decide where to drive. This is significant because the complexity of today’s robotic navigation systems is one of the big obstacles preventing their large-scale deployment. If simple learning-based systems can match or surpass the performance of complex, hand-engineered methods, this may point the way to much more tractable and scalable application of machine navigation in the future. Among the imagined applications are delivery robots and higher-level autonomous driving.”
For more details, we sent Levine a couple of questions via email.
IEEE Spectrum: What constitutes a hint, and what would the training process be for new kinds of hints?
Sergey Levine: In our prototype, the robot gets either a satellite image or a road map (both from Google Maps right now). But the image could really be anything that is spatially organized. For example, we also thought about giving it park hiking-trail maps, but we just didn’t have quite the right type of data for it (most of our current data is not in parks). The current system does require these images to be spatial (i.e., 2D layouts), so it wouldn’t work, for example, with textual instructions in its current form. However, we are currently exploring extensions that could support the use of strings of text in the future. It is not conceptually a big leap, since our current system already uses contrastive learning methods similar to those that have been used in recent work on combining language and images (e.g., CLIP).
To add new types of hints (e.g., park trail maps, amusement-park maps, whatever), the robot would need data that contains trajectories of the robot driving through such environments, and an approximate GPS registration to the image. These details could be pretty rough, as it is not trying to do an exact reconstruction. The need for this last step could be eliminated in future work, but that’s still something we are working on.
Can you talk about the impact that different hints have on the performance of ViKiNG? When dealing with outdated imagery, ViKiNG can handle unexpected obstacles, but can it (for example) identify new shortcuts as well?
Levine: ViKiNG will respond differently depending on whether it is given a satellite image or a road map. See, for example, Figure 9 in the paper: The road map is enough for it to follow the road, but if it gets a satellite image, it can tell that it’s possible to cut across a grassy field instead. The system does not currently account for safety explicitly, so in that sense it will always opt for the most direct path that is collision free—in fact, properly accounting for risk (e.g., not driving into the middle of a road) is something we are planning to investigate in future work.
In terms of identifying new shortcuts: We haven’t tested this, but the map is used primarily as a heuristic, so if it sees a path to the goal, in principle it could take it, even if the map suggests it’s not possible. However, we have not observed this behavior (and we haven’t tested for it); we only tried adding new obstacles (well, we didn’t so much add them as find that one day that someone had parked a giant truck in the path we were going to test...).
Did ViKiNG demonstrate any behaviors that surprised you?
Levine: I was surprised by a few things. First, I was actually quite surprised by just how far ViKiNG drives without intervention—this is not a complete self-driving system; it’s really just two neural-net models stapled together with a clever (but simple) planning algorithm in the middle, so having it drive for several kilometers without crashes or interventions (about the range that Dhruv can run after it with a video camera), in environments ranging from forest trails to industrial parks, was quite surprising to me. Another thing that I actually found quite interesting is its ability to deviate from the map in the presence of obstacles, and to “backtrack” when the map leads it astray (or when it simply makes a mistake).
GPS is necessary for Viking to be successful, correct? What are the other constraints on the system?
Levine: Yes, it requires GPS or some other form of approximate localization. Technically, all it needs is something to give it an approximate “you are here” dot on the map. We could even imagine training a separate first-person-to-map model to predict this (that would be an interesting future work direction!). Besides this, all it requires is the data set to train on (images and actions), a GPU, a way to steer the vehicle, and enough battery life. The requirements are actually pretty minimal. The camera images are monocular (no depth) and are processed by a standard ConvNet, so in principle it could be retrained with any other sensor modality that can be sent into a neural net (which is practically anything).
How would this system scale to larger or more complex environments?
Levine: Scaling to larger and more complex environments is mostly a matter of training data, though the environments are already quite large and complex. Probably the biggest thing that is technically preventing this from being used today, as is, for sidewalk delivery is actually safety: Right now, there is nothing that tells ViKiNG to, for example, avoiding driving in the middle of the street, or attempting to cross a busy freeway. One of the things we are still developing is a way to issue more fine-grained “rewards” to the robot to encourage it to follow “social conventions” like staying off the street. If we wanted to scale it up to larger vehicles, like self-driving cars, then there are also many safety issues that will come up (like in any system of that sort). This will also require significant additional technical development to address. All this is to say that safety still remains a major obstacle for autonomous robots, same as for any other kind of system. But, hopefully, learning-based approaches like ViKiNG will bring us closer to eventually bridging this gap.
As it turns out, UC Berkeley is also involved in DARPA’s RACER program alongside the JPL team. We have an article on RACER here, but the gist is that it’s a long-distance off-road competition for wheeled robots that need to navigate through uncharted environments with help from a low-resolution topographic map. Sounds like something ViKiNG might be cut out for—maybe with some bigger wheels on it, though.
Evan Ackerman is a senior editor at IEEE Spectrum. Since 2007, he has written over 6,000 articles on robotics and technology. He has a degree in Martian geology and is excellent at playing bagpipes.
There’s plenty of bandwidth available if we use reconfigurable intelligent surfaces
Ground level in a typical urban canyon, shielded by tall buildings, will be inaccessible to some 6G frequencies. Deft placement of reconfigurable intelligent surfaces [yellow] will enable the signals to pervade these areas.
For all the tumultuous revolution in wireless technology over the past several decades, there have been a couple of constants. One is the overcrowding of radio bands, and the other is the move to escape that congestion by exploiting higher and higher frequencies. And today, as engineers roll out 5G and plan for 6G wireless, they find themselves at a crossroads: After years of designing superefficient transmitters and receivers, and of compensating for the signal losses at the end points of a radio channel, they’re beginning to realize that they are approaching the practical limits of transmitter and receiver efficiency. From now on, to get high performance as we go to higher frequencies, we will need to engineer the wireless channel itself. But how can we possibly engineer and control a wireless environment, which is determined by a host of factors, many of them random and therefore unpredictable?
Perhaps the most promising solution, right now, is to use reconfigurable intelligent surfaces. These are planar structures typically ranging in size from about 100 square centimeters to about 5 square meters or more, depending on the frequency and other factors. These surfaces use advanced substances called metamaterials to reflect and refract electromagnetic waves. Thin two-dimensional metamaterials, known as metasurfaces, can be designed to sense the local electromagnetic environment and tune the wave’s key properties, such as its amplitude, phase, and polarization, as the wave is reflected or refracted by the surface. So as the waves fall on such a surface, it can alter the incident waves’ direction so as to strengthen the channel. In fact, these metasurfaces can be programmed to make these changes dynamically, reconfiguring the signal in real time in response to changes in the wireless channel. Think of reconfigurable intelligent surfaces as the next evolution of the repeater concept.
Reconfigurable intelligent surfaces could play a big role in the coming integration of wireless and satellite networks.
That’s important, because as we move to higher frequencies, the propagation characteristics become more “hostile” to the signal. The wireless channel varies constantly depending on surrounding objects. At 5G and 6G frequencies, the wavelength is vanishingly small compared to the size of buildings, vehicles, hills, trees, and rain. Lower-frequency waves diffract around or through such obstacles, but higher-frequency signals are absorbed, reflected, or scattered. Basically, at these frequencies, the line-of-sight signal is about all you can count on.
Such problems help explain why the topic of reconfigurable intelligent surfaces (RIS) is one of the hottest in wireless research. The hype is justified. A landslide of R&D activity and results has gathered momentum over the last several years, set in motion by the development of the first digitally controlled metamaterials almost 10 years ago.
This article was jointly produced by IEEE Spectrum and Proceedings of the IEEE with similar versions published in both publications.
RIS prototypes are showing great promise at scores of laboratories around the world. And yet one of the first major projects, the European-funded Visorsurf, began just five years ago and ran until 2020. The first public demonstrations of the technology occurred in late 2018, by NTT Docomo in Japan and Metawave, of Carlsbad, Calif.
Today, hundreds of researchers in Europe, Asia, and the United States are working on applying RIS to produce programmable and smart wireless environments. Vendors such as Huawei, Ericsson, NEC, Nokia, Samsung, and ZTE are working alone or in collaboration with universities. And major network operators, such as NTT Docomo, Orange, China Mobile, China Telecom, and BT are all carrying out substantial RIS trials or have plans to do so. This work has repeatedly demonstrated the ability of RIS to greatly strengthen signals in the most problematic bands of 5G and 6G.
To understand how RIS improves a signal, consider the electromagnetic environment. Traditional cellular networks consist of scattered base stations that are deployed on masts or towers, and on top of buildings and utility poles in urban areas. Objects in the path of a signal can block it, a problem that becomes especially bad at 5G’s higher frequencies, such as the millimeter-wave bands between 24.25 and 52.6 gigahertz. And it will only get worse if communication companies go ahead with plans to exploit subterahertz bands, between 90 and 300 GHz, in 6G networks. Here’s why. With 4G and similar lower-frequency bands, reflections from surfaces can actually strengthen the received signal, as reflected signals combine. However, as we move higher in frequencies, such multipath effects become much weaker or disappear entirely. The reason is that surfaces that appear smooth to a longer-wavelength signal are relatively rough to a shorter-wavelength signal. So rather than reflecting off such a surface, the signal simply scatters.
One solution is to use more powerful base stations or to install more of them throughout an area. But that strategy can double costs, or worse. Repeaters or relays can also improve coverage but here, too, the costs can be prohibitive. RIS, on the other hand, promises greatly improved coverage at just marginally higher cost
The key feature of RIS that makes it attractive in comparison with these alternatives is its nearly passive nature. The absence of amplifiers to boost the signal means that an RIS node can be powered with just a battery and a small solar panel.
RIS functions like a very sophisticated mirror, whose orientation and curvature can be adjusted in order to focus and redirect a signal in a specific direction. But rather than physically moving or reshaping the mirror, you electronically alter its surface so that it changes key properties of the incoming electromagnetic wave, such as the phase.
That’s what the metamaterials do. This emerging class of materials exhibits properties beyond (from the Greek meta) those of natural materials, such as anomalous reflection or refraction. The materials are fabricated using ordinary metals and electrical insulators, or dielectrics. As an electromagnetic wave impinges on a metamaterial, a predetermined gradient in the material alters the phase and other characteristics of the wave, making it possible to bend the wave front and redirect the beam as desired.
An RIS node is made up of hundreds or thousands of metamaterial elements called unit cells. Each cell consists of metallic and dielectric layers along with one or more switches or other tunable components. A typical structure includes an upper metallic patch with switches, a biasing layer, and a metallic ground layer separated by dielectric substrates. By controlling the biasing—the voltage between the metallic patch and the ground layer—you can switch each unit cell on or off and thus control how each cell alters the phase and other characteristics of an incident wave.
To control the direction of the larger wave reflecting off the entire RIS, you synchronize all the unit cells to create patterns of constructive and destructive interference in the larger reflected waves [ see illustration below]. This interference pattern reforms the incident beam and sends it in a particular direction determined by the pattern. This basic operating principle, by the way, is the same as that of a phased-array radar.
A reconfigurable intelligent surface comprises an array of unit cells. In each unit cell, a metamaterial alters the phase of an incoming radio wave, so that the resulting waves interfere with one another [above, top]. Precisely controlling the patterns of this constructive and destructive interference allows the reflected wave to be redirected [bottom], improving signal coverage.
An RIS has other useful features. Even without an amplifier, an RIS manages to provide substantial gain—about 30 to 40 decibels relative to isotropic (dBi)—depending on the size of the surface and the frequency. That’s because the gain of an antenna is proportional to the antenna’s aperture area. An RIS has the equivalent of many antenna elements covering a large aperture area, so it has higher gain than a conventional antenna does.
All the many unit cells in an RIS are controlled by a logic chip, such as a field-programmable gate array with a microcontroller, which also stores the many coding sequences needed to dynamically tune the RIS. The controller gives the appropriate instructions to the individual unit cells, setting their state. The most common coding scheme is simple binary coding, in which the controller toggles the switches of each unit cell on and off. The unit-cell switches are usually semiconductor devices, such as PIN diodes or field-effect transistors.
The important factors here are power consumption, speed, and flexibility, with the control circuit usually being one of the most power-hungry parts of an RIS. Reasonably efficient RIS implementations today have a total power consumption of around a few watts to a dozen watts during the switching state of reconfiguration, and much less in the idle state.
To deploy RIS nodes in a real-world network, researchers must first answer three questions: How many RIS nodes are needed? Where should they be placed? And how big should the surfaces be? As you might expect, there are complicated calculations and trade-offs.
Engineers can identify the best RIS positions by planning for them when the base station is designed. Or it can be done afterward by identifying, in the coverage map, the areas of poor signal strength. As for the size of the surfaces, that will depend on the frequencies (lower frequencies require larger surfaces) as well as the number of surfaces being deployed.
To optimize the network’s performance, researchers rely on simulations and measurements. At Huawei Sweden, where I work, we’ve had a lot of discussions about the best placement of RIS units in urban environments. We’re using a proprietary platform, called the Coffee Grinder Simulator, to simulate an RIS installation prior to its construction and deployment. We’re partnering with CNRS Research and CentraleSupélec, both in France, among others.
In a recent project, we used simulations to quantify the performance improvement gained when multiple RIS were deployed in a typical urban 5G network. As far as we know, this was the first large-scale, system-level attempt to gauge RIS performance in that setting. We optimized the RIS-augmented wireless coverage through the use of efficient deployment algorithms that we developed. Given the locations of the base stations and the users, the algorithms were designed to help us select the optimal three-dimensional locations and sizes of the RIS nodes from among thousands of possible positions on walls, roofs, corners, and so on. The output of the software is an RIS deployment map that maximizes the number of users able to receive a target signal.
An experimental reconfigurable intelligent surface with 2,304 unit cells was tested at Tsinghua University, in Beijing, last year.
Of course, the users of special interest are those at the edges of the cell-coverage area, who have the worst signal reception. Our results showed big improvements in coverage and data rates at the cell edges—and also for users with decent signal reception, especially in the millimeter band.
We also investigated how potential RIS hardware trade-offs affect performance. Simply put, every RIS design requires compromises—such as digitizing the responses of each unit cell into binary phases and amplitudes—in order to construct a less complex and cheaper RIS. But it’s important to know whether a design compromise will create additional beams to undesired directions or cause interference to other users. That’s why we studied the impact of network interference due to multiple base stations, reradiated waves by the RIS, and other factors.
Not surprisingly, our simulations confirmed that both larger RIS surfaces and larger numbers of them improved overall performance. But which is preferable? When we factored in the costs of the RIS nodes and the base stations, we found that in general a smaller number of larger RIS nodes, deployed further from a base station and its users to provide coverage to a larger area, was a particularly cost-effective solution.
The size and dimensions of the RIS depend on the operating frequency [see illustration below] . We found that a small number of rectangular RIS nodes, each around 4 meters wide for C-band frequencies (3.5 GHz) and around half a meter wide for millimeter-wave band (28 GHz), was a good compromise, and could boost performance significantly in both bands. This was a pleasant surprise: RIS improved signals not only in the millimeter-wave (5G high) band, where coverage problems can be especially acute, but also in the C band (5G mid).
To extend wireless coverage indoors, researchers in Asia are investigating a really intriguing possibility: covering room windows with transparent RIS nodes. Experiments at NTT Docomo and at Southeast and Nanjing universities, both in China, used smart films or smart glass. The films are fabricated from transparent conductive oxides (such as indium tin oxide), graphene, or silver nanowires and do not noticeably reduce light transmission. When the films are placed on windows, signals coming from outside can be refracted and boosted as they pass into a building, enhancing the coverage inside.
Planning and installing the RIS nodes is only part of the challenge. For an RIS node to work optimally, it needs to have a configuration, moment by moment, that is appropriate for the state of the communication channel in the instant the node is being used. The best configuration requires an accurate and instantaneous estimate of the channel. Technicians can come up with such an estimate by measuring the “channel impulse response” between the base station, the RIS, and the users. This response is measured using pilots, which are reference signals known beforehand by both the transmitter and the receiver. It’s a standard technique in wireless communications. Based on this estimation of the channel, it’s possible to calculate the phase shifts for each unit cell in the RIS.
The current approaches perform these calculations at the base station. However, that requires a huge number of pilots, because every unit cell needs its own phase configuration. There are various ideas for reducing this overhead, but so far none of them are really promising.
The total calculated configuration for all of the unit cells is fed to each RIS node through a wireless control link. So each RIS node needs a wireless receiver to periodically collect the instructions. This of course consumes power, and it also means that the RIS nodes are fully dependent on the base station, with unavoidable—and unaffordable—overhead and the need for continuous control. As a result, the whole system requires a flawless and complex orchestration of base stations and multiple RIS nodes via the wireless-control channels.
We need a better way. Recall that the “I” in RIS stands for intelligent. The word suggests real-time, dynamic control of the surface from within the node itself—the ability to learn, understand, and react to changes. We don’t have that now. Today’s RIS nodes cannot perceive, reason, or respond; they only execute remote orders from the base station. That’s why my colleagues and I at Huawei have started working on a project we call Autonomous RIS (AutoRIS). The goal is to enable the RIS nodes to autonomously control and configure the phase shifts of their unit cells. That will largely eliminate the base-station-based control and the massive signaling that either limit the data-rate gains from using RIS, or require synchronization and additional power consumption at the nodes. The success of AutoRIS might very well help determine whether RIS will ever be deployed commercially on a large scale.
Of course, it’s a rather daunting challenge to integrate into an RIS node the necessary receiving and processing capabilities while keeping the node lightweight and low power. In fact, it will require a huge research effort. For RIS to be commercially competitive, it will have to preserve its low-power nature.
With that in mind, we are now exploring the integration of an ultralow-power AI chip in an RIS, as well as the use of extremely efficient machine-learning models to provide the intelligence. These smart models will be able to produce the output RIS configuration based on the received data about the channel, while at the same time classifying users according to their contracted services and their network operator. Integrating AI into the RIS will also enable other functions, such as dynamically predicting upcoming RIS configurations and grouping users by location or other behavioral characteristics that affect the RIS operation.
Intelligent, autonomous RIS won’t be necessary for all situations. For some areas, a static RIS, with occasional reconfiguration—perhaps a couple of times per day or less—will be entirely adequate. In fact, there will undoubtedly be a range of deployments from static to fully intelligent and autonomous. Success will depend on not just efficiency and high performance but also ease of integration into an existing network.
6G promises to unleash staggering amounts of bandwidth—but only if we can surmount a potentially ruinous range problem.
The real test case for RIS will be 6G. The coming generation of wireless is expected to embrace autonomous networks and smart environments with real-time, flexible, software-defined, and adaptive control. Compared with 5G, 6G is expected to provide much higher data rates, greater coverage, lower latency, more intelligence, and sensing services of much higher accuracy. At the same time, a key driver for 6G is sustainability—we’ll need more energy-efficient solutions to achieve the “net zero” emission targets that many network operators are striving for. RIS fits all of those imperatives.
Start with massive MIMO, which stands for multiple-input multiple-output. This foundational 5G technique uses multiple antennas packed into an array at both the transmitting and receiving ends of wireless channels, to send and receive many signals at once and thus dramatically boost network capacity. However, the desire for higher data rates in 6G will demand even more massive MIMO, which will require many more radio-frequency chains to work and will be power-hungry and costly to operate. An energy-efficient and less costly alternative will be to place multiple low-power RIS nodes between massive MIMO base stations and users as we have described in this article.
The millimeter-wave and subterahertz 6G bands promise to unleash staggering amounts of bandwidth, but only if we can surmount a potentially ruinous range problem without resorting to costly solutions, such as ultradense deployments of base stations or active repeaters. My opinion is that only RIS will be able to make these frequency bands commercially viable at a reasonable cost.
The communications industry is already touting sensing—high-accuracy localization services as well as object detection and posture recognition—as an important possible feature for 6G. Sensing would also enhance performance. For example, highly accurate localization of users will help steer wireless beams efficiently. Sensing could also be offered as a new network service to vertical industries such as smart factories and autonomous driving, where detection of people or cars could be used for mapping an environment; the same capability could be used for surveillance in a home-security system. The large aperture of RIS nodes and their resulting high resolution mean that such applications will be not only possible but probably even cost effective.
And the sky is not the limit. RIS could enable the integration of satellites into 6G networks. Typically, a satellite uses a lot of power and has large antennas to compensate for the long-distance propagation losses and for the modest capabilities of mobile devices on Earth. RIS could play a big role in minimizing those limitations and perhaps even allowing direct communication from satellite to 6G users. Such a scheme could lead to more efficient satellite-integrated 6G networks.
As it transitions into new services and vast new frequency regimes, wireless communications will soon enter a period of great promise and sobering challenges. Many technologies will be needed to usher in this next exciting phase. None will be more essential than reconfigurable intelligent surfaces.
The author wishes to acknowledge the help of Ulrik Imberg in the writing of this article.