Summary Summary (text) Abstract Abstract (text) MEDLINE XML PMID List

#### Send to:
jQuery(document).ready( function () {
jQuery("#send_to_menu input[type='radio']").click( function () {
var selectedValue = jQuery(this).val().toLowerCase();
var selectedDiv = jQuery("#send_to_menu div." + selectedValue);
if(selectedDiv.is(":hidden")){
jQuery("#send_to_menu div.submenu:visible").slideUp();
selectedDiv.slideDown();
}
});
});
jQuery("#sendto").bind("ncbipopperclose", function(){
jQuery("#send_to_menu div.submenu:visible").css("display","none");
jQuery("#send_to_menu input[type='radio']:checked").attr("checked",false);
});

File Clipboard Collections E-mail Order My Bibliography Citation manager

Format Summary (text) Abstract (text) MEDLINE XML PMID List CSV

- 1 selected item: 19826612
Format Summary Summary (text) Abstract Abstract (text) MEDLINE XML PMID List MeSH and Other Data E-mail Subject Additional text

Generate a file for use with external citation management software.

See comment in PubMed Commons below

Front Comput Neurosci. 2009 Sep 24;3:11. doi: 10.3389/neuro.10.011.2009. eCollection 2009.

# Hebbian crosstalk prevents nonlinear unsupervised learning.

### Author information

^{1}Department of Neurobiology, State University of New York Stony Brook Stony Brook, NY 11794, USA. kcox@notes.sunysb.edu

### Abstract

#### KEYWORDS:

Hebbian learning; ICA; LTP; LTP crosstalk; cortex; synaptic plasticity

- PMID:
- 19826612
- [PubMed]
- PMCID:
- PMC2759358

Figure 1

**Schematic ICA network**. Mixture neurons X receive weighted signals from independent sources S, and output neurons Y receive input from the mixture neurons. The goal is for each output neuron to mimic the activity of one of the sources, by learning a weight matrix W that is the inverse of M. In the diagrams this is indicated by the source shown as a dotted circle being mimicked by one of the output neurons (dotted circle) with the dotted line connections representing a weight vector which lies parallel to a row of M

^{−1}, i.e. an independent component or “IC” . The effect of synaptic update error is represented by curved colored arrows, red being the postsynaptic case (crosstalk between synapses on the same postsynaptic neuron, left diagram), and blue the presynaptic case (crosstalk between synapses made by the same presynaptic neuron; right diagram). In the former case part of the update appropriate to the connection from the left X cell to the middle Y cell leaks to the connection from the right X cell to the middle Y cell, e.g. by. In the latter case, part of the update computed at the connection from the left X cell onto the right Y cell leaks onto the connection from the left X cell onto the middle Y cell. However, in both these cases for clarity only one of the

*n*

^{2}possible leakage paths that comprise the error matrix E (see text) are shown. Note that learning of W is driven by the activities of X cells (the vector x) and by the nonlinearly transformed activities of the Y cells (the vector y), as well as by an “antiredundancy” process.

Front Comput Neurosci. 2009;3:11.

Figure 2

**Plots (A) and (C) shows the initial convergence and subsequent behaviour, for the first and second rows of the weight matrix W, of a BS network with two input and two output neurons Error of**. At 6,000,000 epochs error of 0.1 (

*b*= 0.005 (*E*= 0.0099) was applied at 200,000 epochs,*b*= 0.02 (*E*= 0.0384) at 2,000,000 epochs*E*= 0.166) was applied. The learning rate was 0.01.

**(A)**First row of W compared against both rows of M

^{−1}with the

*y*-axis the cos(angle) between the vectors. In this case row 1 of W converged onto the second IC, i.e. the second row of M

^{−1}(green line), while remaining at an angle to the other row (blue line). The weight vector stays very close to the IC even after error of 0.005 is applied, but after error of 0.02 is applied at 2,000,000 epochs the weight vector oscillates.

**(B)**A blow-up of the box in

**(A)**showing the very fast initial convergence (vertical line at 0 time) to the IC (green line), the very small degradation produced at

*b*= 0.005 (more clearly seen in the behavior of the blue line) and the cycling of the weight vector to each of the ICs that appeared at

*b*= 0.02. It also shows more clearly that after the first spike the assignments of the weight vector to the two possible ICs interchanges.

**(C)**Shows the second row of W converging on the first row of M

^{−1}, the first IC, and then showing similar behaviour. The frequency of oscillation increases as the error is further increased (0.1 at 6,000,000 epochs).

**(D)**Plots the weights of the first row of W during the same simulation. At

*b*= 0.005 the weights move away from their “correct” values, and at

*b*= 0.02 almost sinusoidal oscillations appear.

Front Comput Neurosci. 2009;3:11.

Figure 3

**Trajectories of weights comprising the ICs**. The weights comprising each IC (rows of the weight matrix) were plotted against each other over time (

**(A)**red plot is the first row of W and the blue plot is the second row of W). The simulation was run for 1 M epochs with no error applied and each row of W can be seen to evolve to an IC (red and blue “blobs” indicated by large arrows in panel

**(A)**). From 2 M to 4 M epochs error

*b*= 0.005, i.e. below the threshold error level, was applied and each row of W readjusts itself to a new stable point, red and blue “blobs” indicated by the smaller arrows. From 4 M to 6 M epochs error of 0.02 was applied and each row of W now departs from a stable point and moves off onto a limit cycle-like trajectory (inner blue and red ellipses). Error is increased at 6 M epochs to 0.05 and the trajectories are pushed out into longer ellipses. At 7 M epochs error was increased again to 0.1 and the ellipses stretch out even more. Notice the transition from the middle ellipse to the outer one (error from 0.02 to 0.1) can be seen in the blue line (row 2 of W) in the bottom left of the plot.

**(B)**A blow-up of the inset in

**(A)**clearly showing the stable fixed point of row 2 of W (i.e. an IC) at 0 error (right hand blue “blob”). The blob moves a small amount to the left and upwards when error of 0.005 is applied indicating that a new stable fixed point has been reached. Further increases in error launch the weights into orbit, γ = 0.005.

Front Comput Neurosci. 2009;3:11.

Figure 4

**(A)**Increased error increases the frequency of the oscillations (cycles/10

^{6}epochs) but that the onset of oscillations is sudden at b = 0.01037 (E = 0.0203; L = 0.01; seed = 8), indicating that this threshold error level heralds a new dynamical behaviour of the network. In

**(B)**and

**(C)**(enlargement of the box in

**(B)**) the behaviour of the network at a very low learning rate is shown for a different learning rate and M (γ = 0.0005; seed = 10). The blue curves show cos(angle) with respect to the first row of M

^{−1}, the green curves with respect to the second column. Only the results for one of the output neurons is shown (the other neuron responded in mirror-image fashion). Plot

**(B)**shows that the weight vector converged rapidly and precisely, in the absence of error, to the first row (blue curve; the initial convergence is better seen in

**(C)**); error (

*b*= 0.0088

*E*= 0.0173) was introduced after five million epochs; this led to a slow decline in performance over the next five million epochs to an almost stable level which was followed by a further very slow decline over the next 30 million epochs (blue trace in

**(C)**) which then initiated a further rapid decline in performance to 0 (the downspike in

**(B)**) which was very rapidly followed by a dramatic recovery to the level previously reached by the green assignment; meanwhile the green curve shows that the weight vector initially came to lie at an angle about cos

^{−1}0.95 away from the second row of M

^{−1}. The introduction of error caused it to move further away from this column (to an almost stable value about cos

^{−1}0.90), but then to suddenly collapse to 0 at almost the same time as the blue spike. Both curves collapse down to almost 0 cosine, at times separated by about 10,000 epochs (not shown); at this time the weights themselves approach 0 (see Figure ). The green curve very rapidly but transiently recovers to the level [cos(θ) −1] initially reached by the blue curve, but then sinks back down to a level just below that reached by the blue curve during the 5 M–30 M epoch period. Thus the assignments (blue to the first row initially, then green) rapidly change places during the spike by the weight vector going almost exactly orthogonal to

*both*rows, a feat achieved because the weights shrink briefly almost to 0 (see Figure ). During the long period preceding the return swap, one of the weights hovers near 0. After the first swapping (at 35 M epochs) the assignments remain almost stable for 120 M epochs, and then suddenly swap back again (at 140 M epochs). This time the swap does not drive the shown weights to 0 or orthogonal to both rows (Figure ). However, simultaneous with this swap of the assignments of the first weight vector, the second weight vector undergoes its first spike to briefly attain quasi-orthogonality to both nonparallel rows, by weight vanishing (not shown). Conversely, during the spike shown here, the weight vector of the second neuron swapped its assignment in a nonspiking manner (not shown). Thus the introduction of a just suprathreshold amount of error causes the onset of rapid swapping, although during almost all the time the performance (i.e. learning of a permutation of M

^{−1}) is very close to that stably achieved at a just subthreshold error rate (

*b*= 0.00875; see Figure ).

Front Comput Neurosci. 2009;3:11.

Figure 5

**(A)**The convergence of one of the rows of M

^{−1}, with one of the weight vectors of M (seed 8) with

*n*= 5. The initial weights of W are random. The angle between row 1 of the weight matrix and row 1 of the unmixing matrix are shown. The plot goes to 1 (i.e. parallel vectors) indicating that an IC has been reached. Without error this weight vector is stable. At 200,000 epochs error of 0.05 (

*E*= 0.09) is introduced and the weight vector then wanders in an apparently random manner.

**(B)**The weight vector compared to all the other potential ICs and clearly no IC is being reached. Plots

**(C,D)**on the other hand shows different behaviour for row 2 of the weight matrix (which initially converged to row 4 of M

^{−1}). In this case the behaviour is oscillatory after error (0.05 at 200,000 epochs) is introduced, although another IC (in this case row 3 of M

^{−1}(pale blue line) after 6.5 M and again at 8.5 M epochs) is sometimes reached, as can be seen in

**(D)**where the weight vector is plotted against all row of M

^{−1}. The learning rate was 0.01.

Front Comput Neurosci. 2009;3:11.

Figure 6

**Effect of variable whitening on the error threshold for the onset of instability (**. Left figure shows the relationship between degree of perturbation of an orthogonal (whitened) matrix Q (seed 2,

*n*= 5)*n*= 2) and the onset of oscillation. Data using five different perturbation matrices (series 1–5) applied to a decorrelating matrix Z (see ), are plotted. Each series is of one perturbation matrix, scaled by varying amounts (shown on the abscissa as “perturbation”), which is then added to Z (calculated from a sample of mixture vectors), and plotted against the threshold error (obtained from running different simulations using each variably perturbed Z), shown on the ordinate. At 0 perturbation (i.e. for an orthogonal effective mixing matrix) the network became unstable at a non-trivial error rate. As the effective mixing matrix was made less and less orthogonal by perturbing each of the elements of the decorrelating matrix Z (see , and ) the sensitivity to error increased. The right hand graph is a plot for one random M (

*n*= 5, seed 8) where the mixed data has been whitened by a decorrelating matrix, (C

^{½})

^{−1}. In this case the covariance matrix C of the mix vectors was estimated by using different batch numbers, with a smaller batch number giving a cruder estimate of C and a less orthogonal effective mixing matrix. The learning rate was 0.01 in both graphs.

Front Comput Neurosci. 2009;3:11.

Figure 7

**Relationship of increasing orthogonality of M with threshold error at which oscillations appear**. Left figure shows a plot of the ratio of eigenvalues of MM

^{T}(λ

_{2}/λ

_{1}) against the threshold error

*b*for a given M, for various randomly-generated Ms selected to give a range of threshold errors. On the right hand side is a plot of

_{t}*b*(threshold error) against the cos(angle) between normalized columns of M (for the same set of random Ms). Note that for two exactly orthogonal Ms, different

_{t}*b*values were obtained,

_{t}*n*= 2. The lines in both graphs are least squares fits.

Front Comput Neurosci. 2009;3:11.

Figure 8

**Effect of crosstalk on learning using a single-unit rule with**. An orthogonal mixing matrix was constructed from seed 64 by whitening. The cosine of the angle between the IC found at 0 crosstalk (“error”) and that found at equilibrium in the presence of various degrees of crosstalk is plotted. This angle suddenly swings by almost 90° at a threshold error of 0.064 (

*N*= 2 and tanh nonlinearity*E*0.113). The error bars show the standard deviation estimated over 100,000 epochs.

Front Comput Neurosci. 2009;3:11.

Figure A1

**On the left is a plot of the weights of one of the rows of W with error of 0.0088 (i.e. just above the apparent threshold error) applied at 4 M epochs at γ = 0.0005 (seed 10)**. These are the weights comprising the “other” weight vector from the one whose behavior was shown in Figures B,C. Thus the large swing in the weight vector shown in Figures B,C produced relatively small adjustments in the weights shown here ( at 30 M epochs), while the very large weight changes shown here (at 140 M epochs) correspond to small shifts in the direction of the weight vector shown in Figures B,C. (Conversely, these large weight steps at 140 M epochs produce a spike-like swing in the corresponding weight vector angle). Note the weights make rapid steps between their quasistable values. Also the smaller (blue) weight spends a very long time close to 0 preceding the large weight swing (during which swing the weight vector goes briefly and almost simultaneously orthogonal to both rows of M

^{−1}). Close inspection revealed that the blue weight crosses and recrosses 0 several times during the long “incubation” period near 0. Note the wobbly appearance of the green weight. The thickness of the lines in the left and right plots reflects rapid small fluctuations in the weights that are due to the finite learning rate. On the right is the plot of the cos(angle) between the weight vector whose components are shown in the left plot, and the two rows of M

^{−1}. However,

*b*= 0.00875 (i.e. very close to the error threshold; see Figure ) introduced at 5 M epochs;other parameters the same as in the left plot. Note that the weight vector relaxes from the correct IC to a new stable position corresponding to a cos angle just below 1 (blue plot), and then stays there for 65 M epochs. The relaxation is more clearly seen in the green plot, which shows the cos angle with the row of M

^{−1}that was not selected.

Front Comput Neurosci. 2009;3:11.

Figure A2

**Plots of individual rates using the same parameters as in Figure except γ = 0.005 (which increases the size of the slow and fast fluctuations, which is why the lines are thicker than in Figure ) and**. Each weight (i.e. green and blue lines) comprising the weight vector adopts four possible values, and when the weights step between their possible values they do so synchronously and in a particular sequence (though at unpredictable times). The four values of each weight occur as opposite pairs. Thus the green weight occurs as one of four large values, two positive and two equal, but negative. The two possible positive weights are separated by a small amount, as are the two possible negative weights. The blue weight can also occupy four different, but smaller values. Thus there are two small, equal but reversed sign weights, and two even smaller equal but reversed sign weights. These very small weights lie very close to 0. Since the weights jump almost synchronously between their four possible values, the “orbit” is very close to a parallelogram, which rounds into an ellipse as error increases. One can interpret the four corners of the parallelogram as the four possible ICs that the weights can adopt: the two ICs that they actually do adopt initially and the two reversed sign ICs that they could have adopted (if the initial weights had reversed sign). However, two of the corners are closer to correct solutions than are the others (corresponding to the assignment reached when the blue weights are very close to 0). It seems likely that exactly at the error threshold the difference between the two close values of the green weights, and the difference between the very small values of the blue weights, would vanish. This would mean that the blue weights would be extremely close to 0 during the long period preceding an assignment swap, so the direction of the weight vector would be very sensitive to the details of the arriving patterns. Consistent with this interpretation, the weights fluctuate slowly during the long periods preceding swaps; these fluctuations, combined with the vanishing size of one of the weights, presumably make the system sensitive to rare but special sequences of input patterns. Similar behavior was seen using seed 8.

*b*= 0.0087 (which appears to be extremely close to the true error threshold for this M; the first oscillations occurs at 27 M epochs, which would correspond to 270 M epochs at the learning rate used in Figure ), introduced at 1 M epochsFront Comput Neurosci. 2009;3:11.

Figure A3

**This shows the behavior of the weight vector whose component weights are shown in Figure (cos angle with respect to the two rows of M**. Note the weight vector steps almost instantaneously between its two possible assignments. However, when the weight vector is at the blue assignment, it is closer to a true IC than it is when it is at the green assignment (which is the assignment it initially adopts. When the weight vector shifts back to its original assignment (at 43 M epochs), it shifts orthogonal to both ICs at almost the same moment (sharp downspikes to 0 cosine). Notice the extreme irregularity of the “oscillations”.

^{−1}) Error*b*= 0.0087 introduced at 1 M epochsFront Comput Neurosci. 2009;3:11.

Figure A4

**The plot on the right is similar to those of Figure except that the data was generated from a different simulation with all parameters being the same except that the initial weight vectors were different**. Notice how one of the weight vectors (rows of W) initially evolves to the mirror image in terms of sign of the weight vector in Figure A (right most red blob). The right hand plot shows weight 1 from row 1 of W with weight 2 of row 2 (blue) and weight 2 of row 1 with weight 1 of row 2 (red).

Front Comput Neurosci. 2009;3:11.

## PubMed Commons