We're pleased to have you.
Yes Jason has a number of interests
including numerical linear algebra
scientific computing numerical methods and
also working with floating point.
He's been a developer implementor for
the L.A. pack a linear algebra library.
He's worked on revisions of the I.Q.
points standard and
a lot of development on free
software where players that he's
here as a research scientist at Georgia
working on multi-thread algorithms and
without further ado I'm going
to let him start his talk.
Thank you.
OK Today you could kind of subtitle is
How to beat a dead horse until it's dust
a lot of people get really
sick of this topic but
there's still a bit of life left in it.
Basically I'm solving X. equals B. and B.
is a little bit of person a novel bit but
extra precision just a couple of places
and turn it from being kind of a blob to
scene in America linear algebra textbook
into something you can explain pretty easy
to anyone even a computer and this kind
of my one background because most people
don't need a whole lot of motivation for
trying to solve a it's equal B. But
you know a different audience I'm
not sure really their work or
isn't scientific computing or to fall into
categories of linear algebra problems and
there's a lot more than this
in scientific computing.
But if you pick up the newspaper articles
the journals almost everything falls into
either solving a linear system a B.
solving an eigenvalue system or
an as for the problem.
And I'm just going to talk about the first
one of those solving a linear system.
I mean there are applications for
this going forever I mean I can listen
all day long if I want to but I don't
consult partial differential equations you
know you have all these beautiful
non-linear things going on at the end of
the day discretized that you turn to
system linear equations quite often.
Or up big optimization problems trying
to model the whole economies and things.
There are these iterative
method is that each step.
You're solving this.
Some of linear equations that decide
what direction you're going to go in and
at this point all the big commercial
supercomputers are actually built
specifically for solving a B..
It's kind of a trap they fall into but
it's a trap they fall into for
a reason and
if you hear the impact benchmark.
It's it's just solving it will be.
And you know they wouldn't ever fall into
this trap if it weren't a very high demand
for this in almost everyone works
on solving a single be faster they
want faster want faster here capability
computing term tossed around a lot for
things that we can solve now by
throwing more processors more speed
at things that we couldn't
solve a couple years ago.
So there's another capability kind of
lurking around of all the work to make it
solving things faster we can apply them to
things that people are already solving and
solve them better make that
result easier to interpret
make this algorithms more dependable and
in the kind of cyclical way.
And so I'm oversimplifying all of this.
When you saw these things better.
It opens the door to some
method to solve things faster.
I'll go into that towards the end
of the general outline and
I'm saying better better better.
I'll talk about what I actually
mean anybody here familiar with
America linear algebra made for
all their eyes and get bored but
don't pick too many find points on this
is going to be from very high level.
After that I'll talk about the actual
algorithm I'm going to use it to refine
these things a little bit and you start
off with a solution you make it better and
it quickly editor for
a quick editor process.
So I'm going to some other applications of
better things you don't think of is better
doesn't necessarily equate to faster.
But in some cases it can.
And also better doesn't this equate
to more scalable for paying for
parallel process but again.
There are specific cases where it can and
a lot of this work.
This is like a part of a slice of my
thesis done conjunction with an obvious
mate my advisor Jim demo Dr Leigh L.B.L.
and then a long sequence of people who
were undergrads of various points.
I think Megan is now a graduate
student Stanford David Lu went
off into industry Declan how again.
And I guess this is one of
those multi-year projects and
this is a large part of that and
it's kind of his vanity and
service on many years to
get some of this down.
So part of what I mean by better is
talking about the errors that occur
when you solve a X. equals B.
and you take as input data and
only this a pair and B.
Your matrix A The right hand side B..
You know it's some point of course but
in this is all given in a computer.
So it's all in floating point.
This is the true solution X..
It's something you don't have it's what
you want and again that's a true thing
it's kind of implicit it's
not actually stored anywhere.
And when you act when you compute you
actually get some other answer X.
exit or I'm calling X.
I'm going to iterate on it later.
That's again in floating point so
kind of these two corners of this
diagram are both actually
things you have in a computer.
Now this computed X.
actually corresponds to a true solution
of X. equals B. for a difference.
Different problem.
It's all to be told and
there are two different errors that you
get the error that most people think of is
the error in the solution is
what we call the forward error.
So it's a different distances between X.
and compute the true X.
in what you've computed the other one and
this was a big breakthrough.
Once upon a time as realizing
that this error is important also
is called the backward error.
This is the distance from the problem
that you want to solve the problem
the actually solved and in some cases
this is what you care about more.
And if you just take a typical algorithm
for solving the square X. equals B..
Assume it's all a nonsingular So
there actually is a solution.
Well.
You get these error plots
coming out of them.
So the backward error using L.
you factorization.
You know it's not too bad this green
bar up here is kind of a measure of
the working precision in just one of
these two guys are on different axes both
vertically and horizontally
because are unrelated quantities.
But this is kind of a measure of
the precision you stored everything at
the precision you're storing A and B.
at the present.
Return X. that.
You see the errors you get are kind
of the the measure of error here.
Well it's OK You know this is about as
good as you're ever going to do you think.
And then this is maybe
a in single precision.
This is kind of saying everybody
has half the digits right.
I could see did they give a tenth as close
to half a single precision so approach
everything has about half is that it's
right over here the forward error though.
Well it gets kind of difficult to explain.
The horizontal measure here is a measure
of difficulty of getting to that a little
bit later and it is a problem gets more
and more difficult the forward error
shoots up and it's not easy because you
know you don't actually have X. this.
Through a funny twist of fate
you can actually compute.
This is just a function this error
is a function of the residual
of quanta you actually have your hands on.
So you can compute this
no matter what you have.
This guy you can't actually compute.
I mean here it's a test set up.
I have I know the true X.
is close enough to the true X..
So this you know and
forty what you compute is easy to describe
in what you can't actually compute
is difficult to describe and
you can describe it is difficult.
Actually interpret it to Klee if it's
a program interpret in another program.
So the real goal with all of this is
the change both these errors into step
functions things that
are really easy to describe.
Basically you get
a ridiculously good answer.
Up until the problem is too difficult.
Then you can't say anything.
And again for the forward air
you get a ridiculously easy and
not ridiculous bit of an accurate answer.
Up until the problem is
just too difficult and
we're going to talk about how we can do
this and not only be able to get these
type of results but tell in the forward
air sense which region you're in.
Without doing too much work.
And again this is kind of
graphically saying that you know
you get some bad answers in the book
here and you get some good answers and
up to down here after this process
iteration process all described.
And if there are any question.
Feel free to pipe up just.
So and
there are a lot of possible methods people
thrown at this problem over the years.
You know doing this for general kind
of hard to tell who's going to be in
the crowd so you want to address all
of these are these mention them.
People talk about interval arithmetic
being a great way to solve some of these
things interval arithmetic can be really
useful for telling uni have a problem.
It's a method of computing where
every number is actually a bracket.
It's a it's a little interval upper
bound and lower bound to go through
the computation you accumulate all these
brackets through your computation.
And you don't use tradition algorithms for
these you have to modify your algorithm
a little bit to get something practical
out of it and it's very good to tell you
when you have a problem doesn't help you
at all and help in getting better answer.
It tells you when your answer is bad.
And even finding the closest the best
valid out of these intervals for
you to solving it will be that's an N.P.
hard problem.
It's computationally incredibly difficult.
And so there are various approximation
methods again and it's kind of a pain.
And then there's always the option of
well just throwing your resources at
a computer to exactly the problem
with that is everything
grows exponentially with
the size your problem.
It's not practical outside
of like even ten by ten
the other ideas you know you keep
looping through this you solve it.
You kind of evaluate it.
You solve it again higher
precision you solve the whole
thing get higher precision.
Well it works but it increases the cost
of a big part of your computation.
It's all of this is dominated
by the order in cube thing where
you factor a what you do something
to a to make it easy to solve later.
So I'm going to talk about as another
variation call it a refinement.
It's actually just order and squared
work after the order and cube work.
So really asymptotically disappears.
I'll show a few for most results say
it doesn't completely disappear and so
you get out into really big problems but
the overhead is actually OK.
You only require a little bit of
actual I say a little extra precision
you need extra precision in
only really two locations.
And it's relatively efficient.
The downside all of these these
top three methods you end up with
essentially a computer generated proof
of some property of your answer.
It's kind of odd to say
good interval arithmetic so
once you've done gone through
the algorithm you end up with literally
a proof that your solution
is within your back.
It was it is within the intervals.
But this.
It's not really proof.
It's it's what I call dependable.
But it's not something in say a proven
solution or a validated solution.
What I mean by dependable and
try to reduce your error to the
president's limit is often as you possibly
can and there are always economic trade
offs in this you can throw more and
more computation get a little bit better
at some point you cut it off and so
you either get it down to something it's
really easy to explain the bottom area or
you clearly indicate when
the results may not be so
good you know it might still be good.
These are all bounds but you're not sure.
And again this is a this makes
it sound really pessimistic.
I mean I saw my background's an error
analysis make everything sound pessimistic
because all of the balance of
work with a pessimistic but
you'll see this actually
works out pretty well.
There are a few things I don't touch on
that I'm not going to deeply explain.
And this is kind of the slide for
any numerical linear algebra folks in
the audience kind of cut off
questions before they ask.
I'm not going to go to great detail and
what I mean by difficulty
the short form is it was called
the condition number in each one of these
different error measures has its own
condition number associated with it and
roughly speaking the condition
number is a measure of sensitivity.
So if you perturb the problem
a little bit how much of
the perturbation affect the results.
And you kind of see that before with that
nice little slope up with the forward or
with the difficulty measure home.
It's kind of kind of goes
up with as it gets harder
it is the more sensitive any will
put additional push it up further.
Another issue I'm not going to talk about
is what's called numerical scaling or
collaboration.
Later on was talked about some
results kind of assume the.
You start off all the numbers you're
dealing with are approximately
the same size.
And if you're to write out if you try
to come up with problems by hand.
I probably come up with.
It's almost never true for
computer generated problems.
It's kind of funny if you take a chemical
plant the person modeling the vats
we talk think in terms of meters in all
the units we based on scales on meters
the person in my little nozzle will
be thinking millimeters maybe so
you have these two completely different
scales in your matrix you have to deal
with a little bit but that's hype of
thing is actually really easy to handle.
I'm not going to handle it here because
it introduces a whole bunch of algebra
throughout things that just
obscures important points.
And a lot of times these two
issues are kind of conflated
there is a trivial scale in
cases that are easy to handle
a lot of times people are problems
horrible condition we can't solve it.
Well actually it was just a little bit
ill scaled the scale of the right way you
can solve it again is something a computer
can do relatively straightforward
not can go into details.
Also not to talk about exactly which
error measurements or how exactly how or
measuring those different errors.
So I had the two bars a Ford or
in the backward error.
There are multiple ways of measuring
each one of the lengths of those bars
relative to what you want.
I'm taking kind of one of the more
stringent definitions for
both was called component wise.
So I'm measuring the perturbations in each
one of those of the component components
of the true solution.
So it's an X. an error an individual
component turns into a part of the air
measure in the backward air heard an error
a probation any part of the input data
translates directly into the air.
So it's kind of a large
probation across all of those.
And all of these problems are actually
aspects of the same problem of what Norm
do you measure things by and
this is worth a whole different talk but
most likely an actual linear
algebra Numerical Analysis class.
OK so
a little bit of what I meant by better and
if anyone has any questions
feel free to pipe up on it.
Now I'm going to go into the method I have
for refining to a more accurate solution.
And I'm saying the method I have
this is actually rather old method.
I mean it's really it's just
Newton's method you can really
say Newton did it you can find evidence
and douses papers douses that it did it
you can say about about anything you
start off assume that a nonsingular you
doing through some direct factorization
process like l u d composition.
You get your initial solution.
This would be that solution that had
the error plots for before so kind
of the kind of fuzzy backward error in
the kind of data line for the forwarder.
And then you just repeat and
hear absolute going off forever but
in actual code you set
a hard limit like L.
a pack is typically five or
ten for extra precise stuff.
At each that we compute the residual and
this is what I mean by residual It's just
literally plug things in and
see how far away you are in a sense.
The backward error is a direct
function of this quantity.
You know as a direct function
of the all these quantities but
it measures are in of all you
care about is a backward error
then you check here and you might quit
the loop early if you know you're done.
If you care about forward.
Well as you go through the ration
keep going you solve for a step size.
So you solve a linear system again not for
another expert for
incremental change based on this residual
and then again you can check for
the forward error here I'll go into some
details about that because you can't
measure it directly have to imply what
it is I'll mention is a bit later.
Then you go through any update.
And you just keep it right in
this process you find a residual
that translates into step
size that your solution
over all this algorithm is really
well known linear algebra community.
It hasn't been all that popular
because people didn't see
much of a point to it one step there's a
big point to using it more steps in that.
Not so much.
Some of this new stuff though.
There actually is a points I can make
these things into these nice that
the areas and it's nice that functions.
And if this were exact arithmetic.
You know you could.
Again anything for X.
zero and this would all work in one step.
So the whole interest here is
what happens in floating points.
So some of the persons I'm talking
about the moment you'll get
a position here will get a position here.
I kind of use.
The Intel a background is kind of
old fashioned terms of working for
Cision is the precision of your inputs and
it's the precision I've assumed you
used in factoring all of these guys.
You can think of it in C. is being float
or something else a single precision.
If you want or double and
then use double double but let's just say
it's this is just in single precision.
Now here when you compute
the residual one traditional thing.
It's very useful is using double
precision during this computation.
So it commit all your intermediate
results twice the position of the input.
You're rounded back down to working
position here because you're just going to
plug it into this guy next year.
This is a well known thing some confusion
about when to use double precision
is actually one of the reasons why
the album is not so popular but
it's just doesn't make too much of
a difference on a daily basis but
this combined with something new which
is actually keeping the X. to double
precision as well that has made all the
difference in the past past little time.
So really the new step here is
you don't keep your solution to
the working precision you return you
keep the wider precision internally.
And if you start with
no extra precision or
really if you only use
expertise in this that
the main thing you can say is you
reduce your backward error in one step.
Now that will kind of big fuzz
it turns into a little fuss
is not that exciting you for
terrorist feel pretty gross.
But with this extra little bit of
double precision tossed in there.
All of a sudden you can three or
is much further down and you can see also
relative speeds of things kind of going
back to how better can get you faster.
A lot of machines now single
precision is twice as fast.
OK that's not so big of a deal some
a sheen single precision is like
ten to one hundred times as fast.
Double now we're talking about a big
deal when you do your order and
cube work in single precision.
So this part of the really really
expensive part of factoring
you do with the really
really fast arithmetic.
That helps you in the long run.
So this is going to be an I one slide of
kind of numerical analysis type stuff
in the eyes are going
to already glaze over.
Most likely.
This is really brief really informal and
these two equations are not equations.
They're not even really correct
they're meant to give a feeling for
what's going on but
I start off with each one of the lines
of how these were the computation lines.
There are terms that push to do this
in this case what I say about doing
this to working for sedition it ends up
translating the sum operator that it's
multiplied by the working physician to say
that's two to the negative twenty four and
these other two.
These are the things I'm talking
about are done to extra precision.
So again I was going to kind of let
that hang the extra precision part.
The important thing here.
When you combine these things into
one residual in terms of the other
you get there is some operator
in front of the room.
The old residual that looks kind of
roughly like eight times a inverse
times the working precision.
Now as long as this part of the eight
times the inverse doesn't cause components
of our to grow more than the work in
position cause that to shrink in this
whole term is going to keep going down
to zero effectively and you're left with
a limit of the error in the X.
in the error and updating are and again
the same exact thing happens in the error
of X. It's just an in the other space you
have an operator It looks kind of like a
inverse A And again these are not correct.
If you want you can see L.
a pack working one sixty five or
more the algebra folks used to this know
there's a whole sprinkling of vertical
bars in the qualities through this but
those really obscure what
I'm talking about which is.
The main point is there are these
operators in front of these that if
they're not too bad they end up driving
the error term of the residual turned down
to zero and you're left with these other
terms here as your limiting precisions.
And so if you carry these two
both to double precision.
Then you kind of limited to
double precision in a lot of
ways not a single precision.
And that's kind of the new thing
previously people looked at just
carrying this term double precision
the error in the residual.
It works for some things but
it doesn't really get you all the benefit
a little bit more and really not
much extra computational cost you.
Kiri these two guys double precision
you get a lot of extra benefit.
Look at the really quick brief the test
cases that one graph I've shown and some
more things this is the kind of the cover
my bases slide essentially those cover one
million random samples about two hundred
fifty thousand is about two hundred
fifty thousand make She's a generated
across a range to cover a whole different
area factorisation difficulties.
Each one of those has four systems.
You know four X. be pairs with it.
Two of them are generated by generating
a random acts multiplying getting a B.
out to them.
I generate a random B. which that
has the effect of kind of free will.
Condition day for very difficult.
A That samples to different regions
where it could be bad and I saw for
the true X. again like I said true X.
is not necessarily represent representable
less than exponential space.
But I solve.
You know for single precision I'm solving
these with greater than quads Rescission
is good enough for this work and working
precision what I'm saying that I mean
I trip lead single in this case which is
about eight digits seven eight digits of
decimal and the extra double
prism doing is in this case
again I triple double which is
about sixteen digits of precision.
And I may ask.
OK Well nobody really cares that much
about accuracy and single right.
And it's not that I'm trying
to this to be accurate and
single as it makes a really good way to
sample the hard cases you know first off
it's single it's fast you
can sample a lot of things.
Second off the difficult cases end up.
If you kind of think geometrically
you have your spade.
Of A and B. defining of problems.
The space of a singular problem is
actually a lower dimensional space inside
of that you know something where
really it's measured zero if
you're picking randomly you're never going
to get one this base of difficult problems
is a little bit of fuzzy around that lower
level lower dimensional manifold and
as the precision goes up.
Well it gets harder to hit
that little bit of photos and
also as the dimension goes up you
know your space is getting bigger and
just basically geometries the dimension
goes up is going to be harder and
harder to sample that little manifold also
so that's why this is thirty by thirty and
single precision all the same results.
I'm showing you look the same if I ramp
up the precision around the sizes.
You know there's a little fuzzy
doing complex you have to
multiply by another factor of about two to
account for other computational errors.
And the same plots that I'm going to show
generally apply to sparse major cities
just we're not don't have millions of them
only a couple a couple thousand so they're
not the smooth curves there's a bunch of
texting points but it all looks the same.
So this is a really general technique and
again I'm going
back now to the back where this is what
happens with just that initial solution.
So and I'm not going to talk about the
result right Kerry right just calculate
the residual of double precision doing
basically everything in single or only
the factorization doubles so attacking the
residual and carrying X. double precision.
And this part I guess of these
are on different horizontal axes
Now explain why in a few minutes.
This part of the backward error is
that I'm going to be reporting for
both of these as for
the wider X. not the X.
number turning and
it's plain that in a few minutes.
But as you go from.
This kind of block the other the same
to both start off the same place.
Any little job or
just errors in the plotting script and
you squish it down so you know
like I said all working pieces and
this is why people haven't
bothered with doing.
Position in a refinement you
do all working precision
you get just fine results for
the backward error in single precision.
You know no big deal but if you do
everything to double all of a sudden
you actually get double precision results
out of your single position data.
Now that sounds like it's a waste a lot
of people get very pejorative about these
things and say well singleton isn't only
do deserve single precision results and
if you're writing a general purpose
library it's a really difficult
thing to swallow and
it's not something I should say.
But in general there are uses for this.
Like I said if you work if you have
a really fast Think of precision Well
a lot of your problems you pass in
if you want to delve as an answer.
You might work in single position as much
as you can until the problem gets too
difficult and you get a big speed boost
and you still get the answers you want.
So this is a nice thing
it goes from something.
It's less that there's
a step function here.
Then all of a sudden getting something
that has a backward area this double
precision out of input
that's all single precision.
It's a useful tool for the forward air.
Well things get a little bit
messy as I said I'm actually
going to go back to slice you
see this is mostly just a flat
bar anyways with a bit of fuzz
above it as the difficulty goes up.
Well with a forward error
you take that flat bar and
you centrally multiply it by the
difficulty that gives you an estimate for
the forty or
giving you the backward error.
So you can see it's kind
of goes up linearly and
gets a nasty spread as
a problem gets more difficult.
And a little bit different
than the backward or for
the forty were here I'm only going to
talk about the single precision X..
And not the Double that I'm carrying but
I am computing I'm carrying it through
the algorithm and double again one and
not one step after iterating and not that
many times are going to how many times
very soon after there is a next slide.
All of a sudden this nasty thing you
know this is after the iteration you
didn't change your forward
error much at all.
It's still that bar multiply.
By the difficulty though still
has the same nasty shape
again why most people haven't
really bothered with it.
Of refinement.
You know like that.
If you and this is also true if you
carry the residual to double only
because they're both the residual and
the solution of double precision
all of a sudden you get a nice
pretty step function at your error.
It's a lot easier to explain to people and
also a lot easier to program around
the bases say you have your correct
up to let's just speaking generally have
an error correct except the last digit
unless the problem is really difficult and
you can detect when you're in this region.
So it's a really easy thing
when you're writing a program.
You know it's no more.
Well you know the error
might be fuzzy in here and
what's even worse if you take
the actual error bounds.
You always get the error balance up
here rather than even this flies.
So now you can actually target
this not only the people but
also the programs you programs that don't
do real well in your algebra courses.
So it's yeah I this is the part I'm
really happy about all this and
make an easy problem easy.
It's kind of a rare
thing to see go forward.
Are you OK.
Something is not listening
to me now a lot.
So I make sure I'm in the right spot.
OK.
Going back to that backward or
a slight a couple of pages ago
I get going back to the back or
a slide a couple pages ago.
You remember I said Well I'm trying to
converge down just working for a solution.
And if you accept that say everything but
the last digit you have the number
of steps on the right again.
Pay attention.
The horizontal axis that's unfortunately
different ways and then if you're trying
to get actually down or below to the
working position is this on the left and
B.'s or can the distribution
function of how many steps it takes.
And the residual double lines to ignore.
It's not in these.
You have the choice between working purely
in the working positions or purely in
single precision of the input or working
with the factorization in single but
double precision residual double
precision caring exit poll position.
This is the other aspect of.
Faster vs better you converge
down if you want to actually push
it all the way down you convert it down a
lot faster if you work entirely in double.
But if you're willing to lose just
the last digit while they take
the same number of iterations.
It doesn't matter now it's
just a matter of flops and
is a mantra that people are going to be
hearing over and you're either hearing it
now you're going to keep hearing
it flops or basically free.
It doesn't really matter.
Now if you want to convert all
the way down to the square
of the working position or double
precision a little bit of fuzz here but
basically they're all working
options don't get there ever.
And this one note it's still in five
steps you still get over eighty percent
your cases.
You know what's a back here these things
are almost the same five steps you
still get almost eighty percent.
Of all page.
Sorry five steps you get all five steps.
You're almost at one hundred
percent on both of them and
five steps you still get eighty percent
of guys down double precision and
it still doesn't take more than like.
Fifteen steps to basically hit your limit.
Practically speaking you normally
don't bother with these everything
like here over.
You just call those two difficult.
And you find those by monitoring
your progress at each step how
much the residual shrinking of
a stop shrinking by too much.
Well you know you hit some level of
difficulty that operator that I had before
wasn't shrinking or enough.
So you're in some realm or for some reason
the competition became difficult and
that's kind of another reason not so
too much about difficulty it differs
depending on where you are in the space.
And again I mean you know the last digit
one difference from going all the way and
getting the last everything but
the last it is relatively minute for
this algorithm the forwarder
here's a kind of fun part.
You know converging down to Epsilon W. and
that is I'm sorry that should be a ten.
I was one of the typos I
was running around fixing
the conversion down the forward error
down to basically everything but
almost the last half of the bit.
So it's not correctly rounded but
it's really close takes
very little time at all.
I mean still the five five you're still
getting about eighty percent of them done
and that's the same
eighty percent as before.
You know ten you're still getting almost
everything done until the problem starts
getting really difficult.
And this these steps don't change much.
According the size of the problem.
So you know ten steps is still enough for
a thousand by thousand probably a nice.
And you can see basically working entirely
in single position doesn't buy anything
here if you want everything to the last
digit exams or has to be ten again
five steps get you over eighty percent and
you got a few answers that way there.
That was that little bottom cluster
of the really easy problems now.
Here here is the little other
thing because of the residual is
what you monitor for the backorder you
can monitor progress in the residuals by
taking the ratio see that what you measure
for the forwarder there's nothing there.
I mean you don't have
the true solution next.
But if you go back to the algebra you see
that the dx that step size is actually
directly related to the error.
And since they directly related it
decreases by an operator that's
really really really similar to and
not numerically similar but
it's algebraic it's almost identical
to the same one as the error.
So they both decrease
by about the same rate.
So what you do is you monitor
the rate the dx is shrinking.
And once you get small enough.
You're done you can't
change your ex any more.
That also means you got the right answer
and when it stops making progress.
You also know that you went to a region
where the problem is too difficult.
So again you can declare OK
it's too difficult and you.
Kind of peek open through the black box
and play through some of the algebra
you get a bit more diagnosis than that but
for a black box routine it's not too bad.
Now the performance costs are yes.
Yes.
It's the normal one where everything is so
pessimistic it doesn't help.
That's the problem.
I mean there.
You're guaranteed really what
you should be converging at
those operators are roughly the condition
number of times a small pond or
meal times enter growth and really that's
about the rate you should be converging at
but that tends to be too
big of an upper bound.
So I mean you know you
can say that if you're
if it takes you five steps
to converge that far.
Well that's already a condition
number that's up relatively high.
So you can kind of into it that.
But again it's the normal condition of or
think of it measures the worst case
not Nestle the cases in front of you.
But otherwise it's a really
good thing it would
be nice to have more theory on it but
I don't know if we can in this context.
So actual performance numbers and
these are.
I made a couple years old and here I'm
doing the full thing this is a caring
everything is why it is I can
accept the factorization.
And how real and complex single and
double just to give you a feeling for
how the performance goes.
The blue dots are factorization
time it's normalized to one.
So all of this is relative to how long it
takes you to factor the matrix into that
initial self so
that also that is also in there.
And this is on my skinny M two which
is a relatively balanced architecture
compared to many things and has the weird
little fact that double precision
arithmetic is actually faster than
single which is fun and is very much.
Seems in built purely as a dull
position when you're eligible box.
It's actually a joy to use for
these types of problems.
When you do budget for more but
here you can see that dumb decisions.
Faster than single so this is kind
of just a measure of how quickly
you know how many iterations are taking.
Up until a certain point and these two
green lines on each of these they're all
in slightly different spots because up
until this point everything fits and
catch your factored matrix and
your unfactored matrix using for
the residual everything
is in cash down here.
So this ratio of a little bit over two or
maybe was almost four
hundred say three and a half.
That's a ratio speeds of things
that are entirely in cash.
So that's almost purely a flop count issue
and I'm comparing all the factorization
and some of the solver teams are using the
vendor to highly specific libraries into
refinement stuff is using our experimental
stuff that we haven't bothered tuning.
So it's not that bad.
You know for really tiny problems
you get maybe a factor of
two to four speed slowdown but
there are tiny problems.
If it's one of those things
where you really are worried and
this is pushing the forwarder down as much
as you can being really really greedy.
So those things are if
you're really worried about.
I mean what are the all everything
kind of here to the left is so
fast you normally wouldn't
care about the speed and
suitcase are everything that fits
in cache it runs so quickly.
You don't really care
about the speed that much.
And you're only getting a small hit and
everything out here outside of CAS you see
convergence really quickly to almost no
overhead at all because you're getting
the memory system these extra dots and
because the Star the Purple
triangles are ever Fineman.
On top of the factorization.
So this is the slowdown if you just do it.
Of refinement.
And you know this also has some other
stuff in it about if you're really
really paranoid and you want to compute
all these condition numbers and
difficulties that I'm talking about how
much more do you pay to compute those and
it's a significant amount.
So if at all possible you
should trust what I'm saying.
And just monitor the dx you get
there is analysis about that but
some people want to know how
much it cost the super paranoid.
In like that.
This is a relatively
balanced architecture.
This next one.
It's one of the older xians
I think is two generations.
Back in sales the on line running at three
gigahertz is horrifically unbalanced.
It has a ridiculous number
of flops per memory load
the caster is relatively tiny
compared to how many flops it can do.
And you can see that
you know singles here.
There is almost no overhead
I mean is going to be
imperceptible when you're inside
a cache on a three years machine.
Even for the double precision stuff
well this is kind of funny it looks like
there's a big overhead this is purely
a factor of this is purely the result of
are not optimizing things you know all
the vendor libraries are using the vector
instructions and as they all these results
are sequential except for vectorization So
all the vendor routines like the
factorization are using the full as a see
whatever was on this machine and
aren't using any of them.
So you know if you run
really horribly slow code.
OK you still get a factor of four
slowdown at worst when you're in cash.
You're in cash in three
years machine nobody cares.
I mean you might care if you literally
need to run millions of these in
the second but not so much.
So really the point of these them.
There's not much practical overhead and
if are an optimized academic
code is running well enough.
You don't notice.
That's pretty good you get you
get good answers for almost free.
So I'm going to talk a little bit and
not that long about other
applications of better.
You know I'd There's
a traditional application
of better all talk about a little bit.
And some of you are probably not you
hadn't Yeah I wish I didn't have to bug
routinely did that other things times
when better gives you faster and
better gives you more scalable.
So obvious applications of better I mean
one thing I should mention the routines
I've been talking about are an optimizer
teens they're available in L.A.
pack they've been like
the past two releases.
So all the stuff you can go out and use if
you're into programming with a laid back.
There are special the category of
interface we introduce call the.
There are mental interfaces that are
subject to change because there are nuff
user constraints in these routines you
want to get some feedback from users.
So if you have applications where you want
the right answer about think about it
please use these and give us some feedback
so we can make the interface better.
The other real application for
these things high level environments.
You know I mean you're using these things
because you don't want to think about
every aspect of everything all the time
you know in matlab you want to type X.
equals a backslash B.
You don't really want to think about
the error conditions that much and
you know that if A is somewhat ill
conditioned Matlab an octave both start
yelling you tell you it's nearly singular
didn't American precision a lot.
It doesn't tell you much
else in a lot of times and
you look at those results
in a being OK anyways.
So this is a way that you know it be
relatively simple to introduce into these
the only reason is not an octave
as a kind of don't have time and
now that's a whole nother issue but
to be relatively simple introduce
these techniques into these things.
I'm sorry.
One thing I should say these are these
are teenagers just for the general purpose
we have a whole variation of other things
if symmetric problems metric indefinite
we have a bunch of those things up there.
So we have really be could apply the same
than octave in matlab and just go.
A similar technique here applies to what's
called over the term of the squares.
So a least squares problem where.
You're looking for
the least norm Salut least norm.
Let's see what's right as
the unique solution of least norm.
So you have some things way to over
describe you're trying to make it trying
to find the solution of least norm.
If you rephrase this as
an augmented system.
A trick of you can just use
regular it of refinement on it.
You don't use all your factorization
use something else but
you know this and if you want to
think that there are other high level
environments where you just type
linear model response given variable
response by variable you'd
really like that just work.
You don't want to think about
both the numerical errors and
the statistical model.
You'd rather be is much
just this is just a.
So really these routines the better.
Here is just better.
These are things that most
high level used for using.
Not the not so
obvious application of better speed.
What happens in fast
one single precision or
a lower precision is much
much faster than double.
You know that's kind of
true on a bunch of the G.P.
use that was really true on the first
generation of cell processor.
So imagine you're targeting
the backward error and
this is why I had that graph drop
all the way down the double.
Well and that you most your
problems are well conditioned.
This is something that would happen
in say optimization frameworks
you're trying to solve
a direction to go in.
Well you backward airbases that well maybe
you started off at a point near where you
were it doesn't really matter.
It's OK as long as it's not too far
away from where you work is want that
to be close and
in the very beginning of an optimization
iteration you know the problems
are very relatively well conditioned or
relatively not are not that difficult.
So you work with as much you can single
and you the end cube work in single
precision then you fuck refined it
down to dollars and backward error.
Or you can fall back to doing everything
in the opposition if you have to.
Well some work that is kind of consecutive
with are trying to get the better accuracy
was done on cell back
colleagues over Tennessee.
Or say up at Tennessee now.
And on the original cell solving this
isn't linear equations like how give you.
Maybe it purely in double precision to
get maybe twelve Giga flops a second and
a couple years ago that would have been
great but the cells people have a lot more
but during this interview process we do
the heavy lifting in single person and
then do little bit more work in
double Well all of a sudden for
the easy problem to go from twelve to go
flops up one hundred fifty get a flops.
If you can solve much faster use of bigger
systems is all systems a lot faster and
they really took a completely
independent path.
The same destination we did is kind of
nice you can now unify both of them and
we're probably going to go and
merge all this stuff.
These are a separate set of routines and
only back at some point we
really should merge them.
There's another option here
when a single position.
If it's more into
whatever memory you have.
Like this you know selling
has a little bit of memory.
We're not really talking about that
we're talking was purely about the earth
medics be there.
Here and well it's almost fitting more
into memory I think most of sparse systems
to parse things that are so
big you want to store part of them and
is part of them on some other node.
It doesn't work great.
But for a lot of problems and there's
a great tech report by Haagen Scott it
was rather than Apple Arielle labs
over and England but it got bought
by someone I don't know who yet a lot
of performances learned by indexing but
you still get a speed boost and
there are other tricks you could use like
they're just using a straight traditional
sparse matrix representation.
If you do some fancier stuff
from Sam Williams at L.B.L.
they probably will do a lot better.
So these are some funny things you can
get a lot faster by getting better
kind of surprising most people
don't think about that.
Now another thing.
Scale ability to play for
the parallel sets and
this I think is just about the end of it.
This is really where I came into all
of this working on a symmetric or not.
I'm sorry an symmetric spar so
you factorization.
If you wants you have to.
The typical method of partial
pivoting the thing A for L.
you factorization doesn't work so well for
sparse matrices because you're alternating
between numerical work in reconfiguring
your data structure in America work
in reconfiguring data structure.
It's already a pain on
a one on one processor.
If you try to in parallel just forget it.
So if you want to do is pick
your ordering in the beginning
in the people here in February.
It's kind of what I was
talking about that.
Pick your order to be inning.
You factor it whenever you run into
some problems you will introduce
a little bit of backward or
not a lot just a little bit and
then later on you go through to
patch it up with refinement.
But I said before.
Again this was tried in
eighty's in Colorado.
I believe and I'm blanking on the name he
was resurrected again for super early you.
Distributes super early works pretty
well now computing the pivoting
that's a whole nother topic but
once you have that
you still can solve things really
well off the very very difficult and
appropriate scale you can still make
a lot of the difficult problems easy.
And there's another thing
that's really really recent.
It's when pivot because you appear on top
pivoting purely for computational reason
don't to go between data structure
manipulation and actual numerical work
slows everything down there is some fun
theory that's been coming out recently.
Communication optimal algorithms.
So the end cube part in linear algebra
promotion everything in think of
eigenvalues L.U.V. compositions
solving it will be watch everything
you trade a lot of computation and you
can get algorithms that are optimal for
memory optimal in the parallel sense and
they don't communicate that much.
An optimal in just sequentially how often
it goes outside to catch codes exist for
a lot of these things them.
Thank you.
With all names that were my friends
at Berkeley Mark Herman is rotten written
some of them they're fast even though
they're not optimized which is sweet.
The trick for
these things the L. You parts.
And even Q. R. is a little bit different.
They can't use pivoting as they can't
use the same pivoting techniques
to fix themselves or to try to avoid
the errors in the first place.
Now there is a new pivoting strategy that
came out that works pretty well but again.
Even if it doesn't work great.
You can still do a lot by turning on
refining the solution a little bit and
again it's end squared
after the optimal and cute.
So it's not that bad.
It's end squared that a lot
of ways is streaming.
Or you can again the triangular soft part.
There's a nice awful memory algorithm for
that too.
So really you can use a little bit
of refinement to get past this and
there's also some other papers related
to this talking about all of linear
algebra can be done.
Optimally if you're willing to
work purely with normalizer.
And what I didn't talk about is with
the right scaling tricks you can.
Change for easy enough problems you can
change normalize into other errors.
So really it's a good chance that
up to the constant factor of people
implementing things linear algebra
lots of dense linear algebra now that the
optimal phase is not too much left to do.
So again I mean what talked
about was an chopped off.
We can construct you can use it.
Of refinement to make a really
inexpensive relatively inexpensive but
really dependable solver.
Was it gives you an accurate
answer of which whenever it can.
And you can kind of trade
off performance for that.
And you can reliably detect failures
even when you're unsure and
even for the forty or
they can't directly measure.
So it is the type of thing where
it's easier to explain to people and
easier to write programs around.
And you can trade that around and now
using that you compute better results for
X. equals B. you know you're trading
a little bit of computation in ALL
bit of memory bandwidth for
accurate or better accuracy.
Those are two things that keep
getting better with time and
I haven't talked about latency
in the sparse case that matters.
Ben's case not so much for
this which is nice.
The important part to actually really
getting this dependable is keeping both
parts that extra precision so you compute
the residual extra precision and you also
carry your solution extra precision
internally that last part was the key
that was missing in the previous years and
previous work on the better results.
Like I said you can kind of use those and
just the technique in general think about
solving if you will be more quickly using
better results using the stage precision
thing you start with a sloppy solver.
And you fix it.
One thing I don't think any of us have
tried is that G.M. or is in the middle
of this and play with an additive method
you know Jammer has a backward stable not
like L U but in a manner similar Knopf
you can probably ignore the difference.
Now what would happen if you use
that inside when you use some more
expensive computation on the outside.
It's more accurate.
Nobody's tried that it
may not win anything.
I mean there have been multiple level
nested methods like that in the past and
may or
may not actually be gaining a thing for.
Formants was a nice option.
And so really that's about all I had for
today which probably has you thankful.
So if there are any questions feel
free and like the man I think I'm
currently in thirteen twenty three
Clausen come over bang on the door and
tell me I'm wrong and you.
Look at first hand.
Yeah we haven't gone into that
that would be very similar to what I
was saying about working in single and
trying to double position results because
you lose kind of half your conditioning
when you form the semi-normal equations.
So we haven't done that but
yes it is possible and yeah.
Be nice way to stabilize
those mothers too I think the
OK I'm not talking about
the outer iteration
what I mean is that little solving for
the step the dx in the middle.
Yet
those don't have the analysis and
joined to the forwarder analysis of
those always proven pretty difficult and
I might be a member will pull
that off with a lot more work.
The thing is Newton's method is so
resilient you make coding bugs in
it you still get the right answer.
So we kind of you know you start off with
a thing that's really easy to get right.
And you try to make sure that's right.
But there are other options.
I don't know it's quite possible might
be it is a fundamental take things and
it might be really useful for
things we don't have the matron.
Or you have tunable optimization tunable
accuracy levels at some point but
again I don't know if that heaven is going
to buy you more than just being more
careful in parts of G.M. or is it self.
And I know some people have looked
at doing extradition inside of G.M.
or has and haven't gotten very far that
kind of turned into some of the optimal
algorithms actually because they kind
of hit a dead end with accuracy.
I said of the ole pack codes currently
are hard wired not to do
more than ten iterations.
That seems to be OK.
We haven't tested much beyond a thousand
by thousand because we haven't.
Once you get past that point.
We haven't seen much need to
test higher than that but
again you get almost all the cases you
like eighty five percent of your cases
that you could even remotely get
Yeah there's more analysis to do.
There's more quoting work.
There's very good reason
the analysis is done.
Yeah that's me is a very good reason but
yeah I mean you know there are lots of
different of methods you can think of that
again this is a very simple framework and
this framework has been analyzed it's
not just true for this is true for
Newton's method in general if you have
the operations defined correctly.
That's kind of the power here is Newton's
method quadratic converge and all that you
leverage a lot of that into this when you
phrase it just as a example B. You don't
need most of the power in most the power
in there but a hi I'm into Sir Have
a nice little analysis of just new method
in general brought all the supplies.
You might not be able get quite
the component wise results without playing
scaling games but
normalize everything should just work.
So yeah there are lots of possible dress
you take this to kind of nest levels of
accuracy to get the outer accuracy
using a faster inner one but
I don't know one has been that
successfully for some like us.
Thing else or
I let you free to go enjoy Friday.
Thank you for coming and
thank you for putting up.