# Multi-Terminal Information Theory Problems in Sensor Networks

## Presentation on theme: "Multi-Terminal Information Theory Problems in Sensor Networks"— Presentation transcript:

Multi-Terminal Information Theory Problems in Sensor Networks
Gregory J Pottie Professor, Electrical Engineering Department Associate Dean, Research and Physical Resources Deputy Director, NSF Center for Embedded Networked Sensing (CENS) UCLA Henry Samueli School of Engineering and Applied Science

Context and general issues Basic tools of information theory
Outline Context and general issues Basic tools of information theory Multi-terminal information theory Research domains Data fusion Cooperative communication Sensor network scalability Network synchronization Distributed large-scale systems

Sensor Network Operation
Cooperative communication Data fusion Routing Basic goal: detection/identification of point or distributed sources subject to distortion constraints, and timely notification of end user

Basic Information Theoretic Concepts
Typical Sets (of sufficiently long sequences of i.i.d. variables): Has probability nearly 1 The elements are equally probable The number of elements is nearly 2nH W Xn Yn source channel encoder channel p(y|x) decoder Aim of communications system: Minimize errors due to noise in channel Maximize data rate Minimize bandwidth and power (the resources) Shannon Capacity establishes the fundamental limits

Jointly Typical Sequences
Xn Yn X1n X2n Output set in general larger due to additive noise; Output images of inputs may overlap due to noise

Basic Information Theoretic Concepts
W Xn Yn source channel encoder channel p(y|x) decoder Capacity C is the max mutual information I(X;Y) wrt p(x); that is, choose the set X leading to largest mutual information. Capacity C is the largest rate at which information can be transmitted without error Jointly typical set: from among the typical input and output sequences, choose the ones for which 1/n log p(xn,yn) close to H(X,Y) Size of jointly typical set is about 2nI(X,Y), thus there are about this number of distinguishable signals (codewords) in Xn These codewords necessarily contain redundancy--size of set is smaller than the alphabet would imply; sequences provide better performance than isolated symbols if properly chosen.

Gaussian Channel Capacity
Discrete inputs to channel, and channel adds noise with Gaussian distribution (zero mean, variance N) Input sequence (codeword) power set to P Capacity is maximum I(X;Y) over p(x) such that EX2 satisfies power constraint C = 1/2 log(1+P/N) bits per transmission. The more usual form is to consider a channel of bandwidth W and noise power spectral density No. Then C = W log(1+P/NoW) bits per second.

Rate Distortion: Lossy Source Coding
Rate distortion function R(D) can be interpreted as The minimum rate at which a source can be represented subject to a distortion D=d(X,Y) The minimum distortion that can be achieved given a maximum rate constraint R Interesting dual results to Capacity Spend coding effort on distortion-typical set; rest are don’t cares Applies to compression of real-valued sequences Achievable region R D

Universal Source Coding
Divide sequence into distortion-typical (interesting) and distortion-atypical (uninteresting) sets Index for distortion typical set of small length--consumes our coding effort; atypical set is large, but coding scheme not critical Require systematic means of classifying sequences as typical (promotion mechanism and distance measure) Gold washing algorithm: typical set, plus candidates Distortion typical set Atypical set

Source/Channel Coding Separation
For single link, separately performing source and channel coding achieves optimal rates Separate optimization greatly reduces theoretical complexity Classes of codes have been identified that get very close to respective Shannon limits Joint source/channel coding can reduce latency or overall complexity, but infrequently used since application-specific

Multi-Terminal Information Theory
The preceding discussion assumed a single transmitter and receiver Multi-terminal information theory considers maximization of mutual information for the following possibilities: Multiple senders and one receiver (the multiple access channel) One sender and multiple receivers (the broadcast channel) One sender and one receiver, but intervening transducers that can assist (the relay channel) Composite combinations of these basic types Bayes estimation also aims to maximize mutual information, except the senders do not cooperate and usually there is a fidelity constraint: One sender and multiple receivers (the data fusion problem) Multiple senders and receivers (the source separation problem) Delay and resource usage may also be included

Gaussian Multiple Access Channel
m transmitters with power P sharing the same noisy channel C(P/N)=1/2 log(1+P/N) bits per channel use for isolated sender then the achievable rate region is The last inequality dominates when rates are the same Capacity increases with more users (there is more power) Result is dual to Slepian-Wolf encoding of correlated sources

One sender of power P and two receivers, one with noise N1 and one with noise N2, N1 < N2 The two codebooks are coordinated to exploit commonality of information transmitted, otherwise capacity does not exceed simple multiplexing

Relay Channel One sender, one relay, and one receiver; relay transmits X1 based only on its observations Y1 Y1:X1 X Y Combines a broadcast channel and a multiple access channel Networks are comprised of multiple relay channels that may further induce delay

General Multi-Terminal Networks
m nodes, with node j with associated transmission variable X(j), and receive variable Y(j) Node 1 transmits to node m; what is the maximum achievable rate? (X1,Y1) (Xm,Ym) Bounds derived from information flow across multiple cut sets generally not achievable

Costs of Source-Channel Separation
Source-channel coding separation theorem fails because capacity of multiple access channels increases with correlation, while source encoding eliminates correlation Greatly complicates search for optimal codes; raises question of whether joint coding would be worth it Gastpar has considered asymptotic cost of separate rate-distortion and channel coding Compare: Network rate-distortion coding, followed by cooperative transmission Joint rate-distortion and channel coding Potentially exponentially better performance for joint source and channel coding, in limit the number of nodes n observing a Gaussian source with comparable SNR goes to infinity. Bound, not a prescription for how to do this!

Now let it move… Node m Node 1 Time 2 Time 1
Nodes move within bounded region according to some random distribution; what is capacity subject to energy constraint on messages? Node m Node 1 Time 2 Time 1 Answer depends on delay constraint; eventually they will collide implying near-zero path loss and thus unbounded capacity Other questions: Probability the nodes have connecting path of required rate Probability of message arriving in required delay

Some Recent Research for Sensor Networks
Data fusion in sensor networks N-helper problem Cooperative communications in sensor networks Scalability of sensor networks Sensing for distributed sources Network synchronization and rate distortion Systems design

General Assumptions Objective of network is to solve some (multi-) hypothesis problem, subject to a set of fidelity criteria, and convey the result to some end-user, subject to resource constraints Consequence: fidelity criteria and resource constraints allow meaningful optimization questions to be posed Communications is more costly than signal processing Consequence: long distance communication is to be avoided, if possible Justification: Shannon capacity and Maxwell’s equations are fundamental; SP power cost follows Moore’s Law Signals decay with distance of propagation Consequence: local distributed algorithms become feasible Justification: true for all natural propagation media

Rate Distortion and Data Fusion
Can identify resource use (energy/number of bits transmitted) with rate, decision reliability (false alarm rate, missed detection prob) with distortion Operate at different points on rate distortion curve depending on values of cost function Location of fusion center, numerical resolution, number of sensors, length of records, routing, distribution of processing all affect R(D)

A Simple Algorithm 1: high SNR; initiates 3 2: activated, and requests
Nodes activated to send requests for information from other nodes based on SNR If above threshold T, decision is reliable, and suppress activity by neighbors Otherwise, increase likelihood of requesting help based on proximity to T In likelihood, higher SNR nodes form the cluster Bits of resolution related to SNR (e.g., for use in maximal ratio combining) 1: high SNR; initiates 2: activated, and requests further information 3: SNR too low to respond 3 2 1 3

Optimal Fusion and Information Theory
Bayes estimator maximizes the likelihood FX(x|z) where x is the state of nature and z is the set of observables. Define Zr={z(1),z(2),…,z(r)}=set of observations to time r, then recursive form of the estimator is A variety of classical estimators then maximize the likelihoods based on particular assumptions regarding the priors Fusion: typically weighted combinations of likelihoods to produce decision; as sensors may be very different, question of optimal weighting scheme

Likelihood Opinion Pool
Sensor 1 F(Z1r|x) Sensor 2 F(Z2r|x) P . F(x|Zr) F(x) Prior information Sensor n F(Znr|x) The hard part: determination of the various likelihoods

Likelihood Opinion Pool
Combine using the recursive rule: Taking logarithms on each side, followed by expectations one obtains: Which can be interpreted as posterior information=prior information+observation information; thus can deal in summations of mutual information obtained from different sensor types (e.g., video plus audio).

Designing for Detection
In digital communications, choose modulation for ease of estimation of decision variables and subsequent selection of most likely signal (hypothesis); we design signals for separability In sensor networks, have no control over nature, but we can control: Density and locations of sensors Sensor types These can be manipulated in same way, given a fusion strategy, to ease signal separability or achieve Nyquist sampling of source features. This can also be done adaptively as we learn more about the sources and the propagation environment (in general, reduce model uncertainty): Add sensors, and/or change types (e.g., new deployment) Move sensors Articulate directional elements

Networked Info-Mechanical Systems

The n-helper Gaussian Scenario
Y1 X Y2 Y3 Gateway/Fusion center Yn Multiple sensors observe event and generate correlated Gaussian data. One data node (X) is the main data source (e.g. closest to phenomenon), and the n additional nodes (Y1 - Yn) are the ‘helpers’. The Problem: What codes and data rates so that gateway/data-fusion center can reproduce the data from the main node using the remaining nodes as sources of partial side information, subject to some distortion criterion.

Main Result We do not care about reproduction of the Y variables; rather they act as helpers to reproduce X This problem was previously solved for the 2-node case Key to extension: treat Yk|Yk-1,..X as single new helper Pk. Our solution: for an admissable rate (Rx,R1,…,Rn), and for some Di’s>0, the n-helper system data rates can be fused to yield an effective data rate (wrt source X) satisfying the following rate distortion bound: where s2 is the variance and r is the correlation (straightforward but tedious to calculate as n increases).

Comments Other source distributions analytically difficult, but many are likely to be convex optimizations Generalization would consider instances of relay/broadcast channels in conveying information to fusion center with minimum energy Many sensor network detection problems are inherently local: even though expression may be complicated, the number of helpers will usually be small due to decay of signals as power of distance Numerical results for Gaussian sources indicate a small number of helpers lead to significant improvement; rapidly diminishing returns after four or so for typical propagation conditions. Suggests that source/channel coding separation might in fact be good enough for many practical situations (especially above the local interaction)

Problem Definition of Cooperative Communication
Many low-power and low-cost wireless sensors cooperate with each other to achieve more reliable and higher rate communications The dominant constraint is the peak power, the bandwidth is not the main concern Multiplexing (FDMA, TDMA, CDMA, OFDM) is the standard approach. Each sensor has an unique channel We focus on schemes where multiple sensors occupy the same channel

Example: Space-Time Coding
N transmit antennas and N receive antennas Channel transition matrix displays independent Rayleigh (complex Gaussian) fading in each component With properly designed codes, capacity is N times that of single Rayleigh channel Note this implicitly assumes synchronization among Tx and Rx array elements--requires special effort in sensor networks A coordinated transmission, not a multiple access situation.

Context Cooperative reception problem very similar to multi-node fusion problem; same initiation procedure required to create the cluster, however we can choose channel code. Cooperative transmission and reception similar to multi-target multi- node fusion, but more can be done: beacons, space-time coding Use to overcome gaps in network, communicate with devices outside of sensor network (e.g. UAV)

Channel Capacity Channel state information:
known at transmitter side, and at both sides If channel state information is known at the transmit side, RF synchronization can be achieved Channels: AWGN and fading channels with unequal path loss General formula

Channel Capacity(cont’d)

Comments Capacity is much higher if phase synchronization within transmitter and receiver clusters can be achieved Have investigated practical methods for satellite/ground sensors synchronization Beacons (e.g. GPS) can greatly simplify the synchronization problem for ground/ground cooperative communications Recent network capacity results do not take into account possibilities for cooperation by nodes as transmitter/receiver clusters

Received signal power decays with distance, and transmission power is limited Frequency re-use is possible; sophisticated antenna/MIMO systems improve the constant Nodes generate traffic, and can relay traffic from other nodes If did not generate traffic, then higher node density implies greater network capability (improved re-use) All nodes alike We will also relax this later

Transport Capacity of Wireless Networks
n Nodes within some fixed region A, with max radio range R, bandwidth W, generating data. Source-destination pairs random; per node transport capacity is then

Transport Capacity of Wireless Networks II
Note this is achieved by using simple relay strategy: one link at a time without cooperation in transmission or reception (Gupta-Kumar); but bad news continues even with optimal cooperation (Gastpar-Vetterli) The inverse square root of n behavior can be roughly explained by average number of links increasing in a path of a given length, each of which must deal with more traffic to be carried, with the same bandwidth.

The only solution when everyone generates traffic is to add more resources as n increases Traditional approach: communication hierarchy where we add new resources at each layer Each level is limited in numbers Traffic is aggregated and carried on set of trunks of increasing bandwidth and thus capacity Higher levels are longer distance, also limiting latency

Scaling in Sensor Networks
Elements not only generate traffic but can process data Do not necessarily want or need to send raw information to distant users with same probability as near neighbors Key to scalability is to change the source-destination pair distribution to local communication (in limit, most nodes in fact send nothing) Key to proof is to separately consider densities of sources, sensors and communication relays, and pose problem as extraction of information to within particular fidelity (rate distortion)

Scalability for Point Sources in Sensor Networks
Cooperative rate distortion coding results in most communication being local; more nodes do not necessarily result in more traffic under distortion criterion More relays reduce frequency re-use distance and thus interference; capacity can increase without bound Thus more nodes increase likelihood of extracting information at desired fidelity

Comments Number of bits a sensor reports is a complicated function of density Low density: report nothing if SNR too low High density: may need to report only decision Moderate density: many nodes may need to locally cooperate with mix of raw data and decision likelihoods Far away powerful sources will activate many nodes with similar SNR, but a small subset of nodes will be sufficient to make decisions Design objective will be to minimize resources required to suppress node activity

Scalability for Distributed Sources
To estimate parameters of a field (e.g., to get isotherm map) information increases until achieve desired spatial sampling After this extra nodes contribute no additional information, but can increase communication resource Image processing analogy: specify pixel size Parameters to describe local field can be compact compared to raw data, for given level of distortion

Practical Implementation
Dense network; in neighborhood have mix of nodes with different ranges, operating in separate bands Locally route towards the longer range links; they act as traffic attractors, causing number of hops at any given layer to be small Cooperative communication among nodes would serve mainly to assure reliability of paths towards next level of hierarchy Result is a (largely) standard overlay hierarchical network Any cross-layer optimization (e.g., joint source-channel coding) is confined to the local neighborhood, since this is where most of the resources are consumed in any scalable solution.

Network Synchronization
Synchronism is needed for wide set of purposes in sensor networks: Coordination of power down/up for energy savings Time stamping of data Coherent combining in communication or sensing (cooperative comm., fusion, position location) Traditional approaches assume receivers/processors always on, and provide same precision everywhere by locking oscillators Sensor networks are different Do not need same level of synchronism at all times and everywhere Do need to save energy

Synchronization and Rate Distortion
Clocks are not explicitly locked; rather record differences of time scales to allow explicit conversion. References are passed either on a schedule or on demand for post facto synchronization Frequency and precision of updates (the rate) depends on local accuracy requirement Dtj (the time distortion) Would like to bound rate subject to accuracy requirements and acceptable delays in achieving synchronism Very similar issues for position localization

Implications of Signal Locality
Severe decay of signals with distance (second to fourth power) Mutual information to source dominated by small set of nodes Cooperative communication clusters for ground to ground transmission will likely be small Implications: Local processing is good enough for many situations; do not need to convey raw data over long distances very frequently Consequently, lowest layers of processing/network formation, etc. are the most important, since most frequently invoked (“typical”) Practical example: Specialized local transmission schemes (e.g., for forming ad hoc clusters), but long range might use conventional methods such as TCP/IP

Hierarchy in Sensor Networks
For dealing with the network as a whole, number of variations of topology are immense Distributed algorithms exploiting locality of events Use of ensembles for deriving bounds In between, considers layers of hierarchy, each of which may be amenable to a conventional optimization technique