|
|||
|
|
|
|
|
|||
|
00:00 |
(Beginning of video)
|
|
|
|||
|
00:01 |
Okay let's talk about challenge one representation if we remember representation is learning to exploit the cross-modal interaction between individual element across different modalities and there were trees up challenges Fusion coordination and fission and Louisville Focus first on the fusion fusion is learning a joint representation that model does cross-modal in traction and so you when we study cross-modal and traction you and specifically the fusion you can see it almost as a spectrum going from basic Fusion where I'll do it is from two different modalities the representation may be a lot more homogeneous maybe they already a vector representation for both want to go to the complex. Where are there the input is a heterogeneous a really quick closer maybe from raw modalities and although their heart problem is complex fusion and that's where we want to eventually go basic Fusion is where there's a lot of very important. That's where we'll spend a little bit more time at the beginning and also because basic this complex Fusion when you have two different modalities can in fact and it has been a lot of time move back towards basic Fusion because when people will do is take two different model TVs run modalities bring them into an encoder into a representations that a lot closer and then do diffusion and that in fact brings you back to basic Fusion in fact so example images you may have a CNN or visual Transformer or are you may have Bird Transformer or some GPT. And so I was just worried including so these just to show that a lot of the theory will talk about basic Fusion. We'll talk about is applicable to many cases even in those complex situation but there are also approaches that will clearly go beyond the uni motoland quarter approach and as a reminder that included can be both Lone Tree train or learn jointly with the fusion itself so let's study the basic concept of interesting ideas behind representation fusion and as you remember this is the goal of learning cross-modal and traction and I I think it's it is helpful to start simple first before making it harder and so let's start with the universal case one dimension for each modalities.
|
|
|
|||
|
02:57 |
Why do I do that why you're doing this because once you do this then you can start riding things in the same way you'll doing and statistical model like a linear regression model where here you would have your constant that doesn't change.
|
|
|
|||
|
39:07 |
(End of video)
|
|
|
0:00 |
|
|
|
0:05 | ||
|
0:10 |
|
|
|
0:15 | ||
|
0:20 | ||
|
0:25 |
|
|
|
0:30 |