求解一般带约束凸优化问题的一个简单的并行算法

2011 年 7 月 8 日

优化领域小学生来报道，给大家介绍一个算法：a simple parallel algorithm with an O(1/T) convergence rate for general convex programs[1]，我最早知道这个算法应该是在@覃含章那个关于ADMM算法的回答中：

交替方向乘子法（ADMM）算法的流程和原理是怎样的？ www.zhihu.com

其实这个算法步骤上和ADMM并不相似，ADMM的两个block 和是顺序更新的，先update ，再用 update ，我们称之为”Gauss-Seidel”；而如果和的更新完全是同时进行的，我们称之为”Jacobi”。但不要误会，ADMM作为一个知名的求解大规模优化问题的分布式算法，并行能力是很强的，它真正的并行其实是在decision variable 内部的multiple blocks上，decision variable 往往是人为构造出来处理constraint的，这也是为什么它虽然只有两个block，依然功能强大应用广泛，把ADMM extend到N-block这种idea在我看来是完全没必要的，说到底还是看你会不会用。说回到要介绍的算法，它其实和我整理过的另一个算法PCPM[2]很像：

门泊东吴：Predictor Corrector Proximal Multiplier Method zhuanlan.zhihu.com

具体我会在下面慢慢分析。文章作者Hao Yu博士其实是有知乎号的，这工作完全不应该由我一个小学生来班门弄斧(捂脸)，我更多是给自己整理一个学习小笔记，方便以后快速refresh memory。学艺不精，一知半(不)解，欢迎大家批评指正。

1. 问题描述

考虑如下这个一般带约束的凸优化问题:

是 closed convex set，和 , 是 continuous and convex on 。

2. 假设

Assumption 1. There exists a (possibly non-unique) optimal solution that solves (P0).

Assumption 2. (P0) has Lagrange multipliers attaining the strong duality.

Assumption 3. Function is Lipschitz continuous with modulus .

我私自调整了原paper里assumption的顺序，因为 Assumption 1 和 Asuumption 2 会经常捆绑起来以各种形式出现在求解convex problems的primal-dual type算法的assumption里，通常第一条就是，可以是Slater + existence of optimal solution推existence of saddle point，也可以是Slater + existence of saddle point推existence of optimal solution。比如在Mark Teboulle最近的一篇paper[3]里：

解一个标准的线性约束凸优化问题，他们是这样来写他们的assumption的：

当然，Hao Yu博士写得也很考究，还特别在 Assumption 2 后面指出了，Slater可以保证 Assumption 2 但并不必要。细微之处见真章，这些都是mathematical writing (包括assumption、lemma、proposition、theorem、corollary、proof)，可能还有analytical writing的能力，我自己因为缺乏系统的数学training以及本来写作就不行，在写paper这一方面能力特别差，这可能也是很多华人的通病，所以平时要多注重积累，(与大家共勉)。

Assumption 3 是一条我个人很不喜欢的assumption，但又确实很常见；它实际上就是在要求constraint function的增长速度不能快过线性，其实还是有点强的，哪怕Lipschitz gradient都会稍微好一点。而且，在实际求解过程中，你永远可以在decision variable上套一个很大的box constraint，比如，那好多functiond都是Lipschitz continuous了，但(我觉得)这样做就没意思了，还有一条我更不喜欢的assumption，就是bounded dual，就不在这里展开说了。

3. 算法描述

(因为是整理给自己看的，我的notation是按自己的习惯来的，和原paper稍有不同，但整体是自洽的。)

,
while termination conditions are not met do

为啥说这是个分布式算法呢？如果原问题(P0)有如下的N-block separable structure的话：

对所有：是 closed convex set，和 , 是 continuous and convex on 。

可以改写成:

算法步骤(3.1)里的(*)式是和PCPM算法中的dual predictor很像的，但因为该算法中的特殊更新步骤(3.2)，使得 natually ，所以就不需要额外再往上project，这一点是很妙的。看不懂没关系，我后面还会再讲。步骤(3.3)就是很标准的primal averaging，有经验的一看就知道要证。

文章作者管这个叫virtual queues，”because their update rule resembles a queueing equation”。下面来看看这个更新步骤(3.2)究竟在做啥，能使有些什么性质。

Property 1. for all , .

Property 2. for all , .

Property 3. , for all .

Property 1 用induction(数学归纳法)很好证：对任意，。假设，分三种情况：

(1). ：

(3.2)简化为，约束不满足的情况，dual variable往上增长，加大punish的力度，make sense。

(2). ：

看图可知，(3.2)简化为。

(3). ：

看图可知，(3.2)简化为。

所以 Property 1 得证，后面的 Property 2、3 同样分这三种情况，一目了然。更重要的是，把这三种情况总结一下，我们得到了(3.2)的庐山真面目：

这个的更新还是很妙的，不同于传统的Lagrange multiplier，永远都不会打到，它会一直待在的上面，哪怕这个约束已经满足了。

4. 收敛分析

Theorem 1. Let be an optimal solution of (P0) and be a Lagrange multiplier satisfying Assumption 2. If , then for all , we have:

for all .

(老规矩，下面扒一下证明；但也不会像以前一样全部写出来了，就稍微过一下。)

Define . Define Lyapunov drift :

(我其实不是很明白为啥它要叫Lyapunov drift。)

Lemma 1. for all .

Proof: 不同于原paper里的证明，我觉得把”庐山真面目”(3.2.simp)写出来后，分两种情况写一写，一下子就能出来。

Lemma 2. Let be an optimal solution of (P0). If , then for all we have:

Proof: 很标准的一个proximal minimization point inequality加 Assumption 3 。

Lemma 3. Let be an optimal solution of (P0).

If , then for all , we have
If , then for all , we have

Proof: 看到第一个不等式， Theorem 1 的第一部分(4.1)式呼之欲出，因为我们有个著名的不等式：

至于 Theorem 1 的第二部分(4.2)，constraint violation，同学们自己去看paper吧，我就不扒了。(年纪大了真，要早点睡了。)

5. 和PCPM的一个小比较

写不动了。。。我之前也写过，Teboulle的原paper[2]虽然只讨论了2-block linearly constrained convex optimization problem，但把PCPM推广到N-block nonlinearly constrained是非常trivial的，把线性约束矩阵的norm换成非线性约束函数的Lipschitz modulus就可以了，所有证明都轻松能过。如果用PCPM算法解(P1)：

它的算法步骤是这样的：

,
while termination conditions are not met do

两个算法的assumption是完全一样的，差就差在dual variable的更新步骤上，PCPM把dual predictor 和dual corrector 都只是打到，而这里讲的这个算法是要把dual variable 拔高到的，个中玄机，诸位再参透参透。而且，原paper[2]给了global convergence，然后加了一个proximal Lagrangian inverse mapping在原点处Lipschitz continuous的假设后，给了个linear convergence rate；而如果没有这个假设，convergence rate是什么并没有讨论，不过我猜后面肯定有人证过了，加个primal averaging，没跑的(狗头)。还有，其实仔细想想，PCPM算法里的和这个算法里的是可以有个转换关系的：，再比比PCPM里对的要求和这里对的要求，嗯，好像还能发现点什么。。。

6. 写在最后

paper[1]后面其实还有个做得很好的应用，叫decentralized multipath network flow control problem，我实在不是个做control的人，就不扒了，感兴趣的同学，可自行阅读[1]。(悄悄说一句，Hao Yu博士的thesis也写得很好，很值得一读。) 最后的最后，本文是用markdown写的，用Zhihu On VSCode发的，还是觉得不是很方便，需要改进。

参考文献

[1]. H. Yu and M. J Neely, A Simple Parallel Algorithm with an O(1/t) Convergence Rate for General Convex Programs , SIAM Journal on Optimization, 27(2): 759 – 783, 2017.

[2]. G. Chen and M. Teboulle, A Proximal-based Decomposition Method for Convex Minimization Problems , Mathematical Programming, 64(1-3): 81 – 101, 1994.

[3]. S. Sabach and M. Teboulle, Faster Lagrangian-Based Methods in Convex Optimization , arXiv preprint arXiv:2010.14314, 2020.

M	T	W	T	F	S	S
« Jan
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

演道网

求解一般带约束凸优化问题的一个简单的并行算法

About The Author

fenny

Related Posts

About The Author

fenny