Python 并发编程

创始人

2024-04-20 17:43:57

一.Python 对并发编程的支持

多线程：threading，利用CPU和IO可同时执行的原理，让CPU不会干巴巴等待IO完成，而是切换到其他Task（任务），进行多线程的执行。
多进程：multiprocessing，利用多核CPU的能力，真正的并行执行任务。
异步IO：asyncio，在单线程利用CPU和IO同时执行的原理，实现函数异步执行。

额外的辅助功能：

使用Lock对共享资源加锁，防止冲突访问。
使用Queue实现不同线程/进程之间的数据通信，实现生产者-消费者模式
使用线程池Pool/进程池Pool，简化线程/进程的任务提交、等待结束、获取结果。
使用subprocess启动外部程序的进程，并进行输入输出交互。

二.怎样选择多进程多线程多协程

Python 并发编程有三种方式

多线程Thread
多进程Process
多协程Coroutine

1.什么是CPU密集型计算、IO密集型计算？

CPU密集型（CPU bound）:

CPU Bound的意思是任务收到CPU的限制，CPU达到顶峰。

CPU密集型也叫做计算密集型，是指 I/O可以在很短的时间就可以完成，CPU需要大量的计算和处理，特点是CPU占用率相当高。

例如：压缩、解压缩、加密解密、正则表达式搜索
IO密集型（I/O bound）：

IO密集型指的是系统运作大部分的状况是CPU在等待 I/O（硬盘/内存/网络）的读/写操作，CPU占用率仍然较低。

例如：文件处理程序、网络爬虫程序、读写数据库程序。

2.多线程、多进程、多协程对比

多进程 Process (multiprocessing)

一个进程中可以启动N个线程。
- 优点：可以利用多核CPU并行运算。
- 缺点：占用资源多、可启动数目比线程少。
- 适用于：CPU密集型计算。
多线程 Tread (threading)

一个线程中可以启动N个协程。
- 优点：相比于进程，更轻量级、占用资源少(资源指的是变量的存储)。
- 缺点：
  - 相比进程：多线程只能并发执行，不能利用CPU的多核（由于GIL锁）。
  - 相比协程：启动数目有限制，占用内存资源，有线程切换开销。
- 适用于：I/O密集型、同时运行的数目要求不多。
多协程 Coroutine (asyncio)
- 优点：内存开销最少、启动协程数量最多。
- 缺点：支持的库有限制（aiohttp vs requests）、代码实现复杂。
- 适用于：I/O 密集型计算、需要超多任务、但有现成库支持的场景。

三.Python 全局解释器锁GIL

1.Python速度慢的两大原因

相比于C/C++/JAVA，Python确实慢，在一些特殊场景下，Python比C++慢100~200倍。由于速度慢的原因，很多公司的基础架构代码使用C/C++开发，比如各大公司阿里/腾讯/快手的推荐引擎、搜索引擎、存储引擎等底层对性能要求高的模块。

Python速度慢的原因：

动态类型语言、边解释边执行。
GIL全局解释器锁，无法利用多核CPU并发执行。

2.GIL是什么？

全局解释器锁（Global Interpreter Lock，缩写GIL），是计算机程序设计语言解释器用于同步线程的一种机制，它使得任何时刻仅有一个线程在执行。

即使在多核心CPU上，使用GIL的解释器也只允许同一时间执行一个线程。
在这里插入图片描述

由于GIL的存在，即使电脑有多核CPU，单个时刻也只能使用1个核心，相比并发加速的C++/JAVA所以慢。

3.为什么有GIL?

简而言之：Python设计初期，为了规避并发问题引入了GIL，现在想去除却去不掉！

为了解决多线程之间数据完整性和状态同步问题

Python中对象的管理，是使用引用计数器进行的，引用数为0则释放对象。

开始：线程A和线程B都引用了对象obj，obj.ref_num = 2，线程A和B都想撤销对obj的引用。

GIL确实有好处：简化了Python对共享资源的管理。

四.怎样规避GIL带来的限制？

多线程threading 机制依然是有用的，用于IO密集型计算

因为 I/O（read、write、send、recv、etc.）期间，线程会释放GIL，实现CPU和IO的并行，因此多线程用于IO密集型计算依然可以大幅提高速度，但是多线程用于CPU密集型计算时，只会更加拖慢速度。
使用multiprocessing的多进程机制实现并行计算、利用多核CPU优势，为了应对GIL的问题，Python提供了multiprocessing。

五.利用多线程，Pyhton爬虫被加速10倍数

1.Python创建多线程的方法

准备一个函数
```
def my_func(a, b):do_craw(a, b)
```

怎样创建一个线程

import threadingt = threading.Thread(target=my_func, args=(100, 200))

启动线程
```
t.start()
```
等待结束
```
t.join()
```

2.改写爬虫程序，变成多线程爬取

import requests
import threadingurls = [f'https://www.cnblogs.com/#p{page}' for page in range(51)]def craw(url):res = requests.get(url)print('url: {}, len: {}'.format(url, len(res.text)))def multi_thread():print('multi_thread begin')threads = []for url in urls:threads.append(threading.Thread(target=craw, args=(url, )))for thread in threads:# 启动线程thread.start()for thread in threads:# 等待结束thread.join()print('multi_thread end')%%time
multi_thread()

3.速度对比：单线程爬虫VS多线程爬虫

import requests
import threadingurls = [f'https://www.cnblogs.com/#p{page}' for page in range(51)]def craw(url):res = requests.get(url)print('url: {}, len: {}'.format(url, len(res.text)))def single_thread():print('single_thread begin')for url in urls:craw(url)print('single_thread end')def multi_thread():print('multi_thread begin')threads = []for url in urls:threads.append(threading.Thread(target=craw, args=(url, )))for thread in threads:# 启动线程thread.start()for thread in threads:# 等待结束thread.join()print('multi_thread end')%%time
single_thread()%%time
multi_thread()

六.Python实现生产者消费者爬虫

1.多组件的Pipeline技术架构

复杂的事情一般不会一下子做完，而是分很多中间步骤一步步完成。
在这里插入图片描述

把很多事情分很多模块来处理的这种架构叫做Pipeline，每个处理器叫做Processor。其实生产者-消费者就是一个典型的Pipeline，第一个就是生产者，最后一个就是消费者。生产者生产的结果会通过中间数据传给消费者进行消费。生产者使用输入数据作为原料，消费者输出数据。

2.生产者消费者爬虫的架构

在这里插入图片描述

生产者-消费者爬虫架构就是说里面有两个Processor。第一个process获取待爬取的URL列表进行网页的下载，下载的内容放在下载好的网页队列中。消费者消费中间的数据，进行网页的解析并且把结果进行存储。

这样做的好处是：生产者和消费者可以由两拨人开发，并且配置不同的资源（如线程数）。

那么两个线程组之间的交互数据是怎么进行的？

3.多线程数据通信的queue.Queue

queue.Queue可以用于多线程之间的、线程安全的数据通信。

线程安全是指多个线程并发访问数据不会出现冲突。

导入quque库
```
import queue
```
创建Queue对象
```
q = queue.Queue()
```

添加元素

# put 当队列满了之后，会阻塞，直到队列中有了存放位置才能put进去
q.put(item)

获取元素

# get 当队列中没有元素是，会阻塞，直到队列中有了数据
item = q.get()

查询状态

# 查询队列元素数量
q.qsize()# 判断是否为空
q.empty()# 判断是否已满
q.full()

4.代码编写二实现生产者消费者爬虫

import time
import random
import requests
import threading
import queue
from bs4 import BeautifulSoupdef craw(url_queue, html_queue):while True:# 从队列中取出一个URLurl = url_queue.get()# 请求获取页面html = requests.get(url).text# 将页面内容加入到队列中html_queue.put(html)# 打印信息print('线程名：{} URL：{} URL队列剩余数：{}'.format(threading.current_thread().name, url, url_queue.qsize()))        # 随机休眠time.sleep(random.randint(1, 2))def parse(html_queue, data_queue):while True:# 从队列中取出信息html = html_queue.get()# 从网页中提取信息soup = BeautifulSoup(html, 'html.parser')links = soup.find_all('a', class_='post-item-title')for link in links:data_queue.put((link['href'], link.get_text()))print('线程名：{} 数据的数量：{}'.format(threading.current_thread().name, data_queue.qsize()))

# URL队列
url_queue = queue.Queue()# 网页队列
html_queue = queue.Queue()# 数据队列
data_queue = queue.Queue()

urls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 51)]for url in urls:url_queue.put(url)# 创建三个线程去完成爬取
for idx in range(3):t = threading.Thread(target=craw, args=(url_queue, html_queue), name=f'craw_{idx}')t.start()# 创建三个线程去完成解析
for idx in range(3):t = threading.Thread(target=parse, args=(html_queue, data_queue, ), name=f'parse_{idx}')t.start()

# 查看数据队列
list(data_queue.queue)

七.线程安全问题以及Lock解决方案

1. 线程安全概念介绍

线程安全是指某个函数、函数库在多线程环境中被调用时，能够正确地处理多个线程之间的共享变量，使程序功能能正确完成。

由于线程的执行随时会发生切换，就造成了不可预料的结果，出现线程不安全。

def draw(account, amount):"""银行取钱"""if account.balance >= amount:account.balance -= amount

这样的代码看起来没有问题，但是在多线程环境下，就会出问题。因为多线程环境下，线程在不由自主的切换。

假设银行卡余额1000，两个线程同时取出800。余额1000大于800，进入if判断取钱

2.Lock 用于解决线程安全问题

用法一：try_finally模式

import threadinglock = threading.lock()lock.acquire()try:# do somethingfinally:lock.release()

用法二：with模式

import threadinglock = threading.Lock()with lock:# do something

3.示例代码解决问题以及解决方案

未加GIL锁:

import threading
import timeclass Account:def __init__(self, balance):self.balance = balancedef draw(account, amount):"""取钱"""if account.balance >= amount:# sleep 一定会导致线程阻塞和线程切换time.sleep(0.1)print(threading.current_thread().name, '取钱成功')account.balance -= amountprint(threading.current_thread().name, '余额', account.balance)else:print(threading.current_thread().name, '余额不足！')account = Account(1000)theading_a = threading.Thread(name='theading_a', target=draw, args=(account, 800))
theading_b = threading.Thread(name='theading_b', target=draw, args=(account, 800))theading_a.start()
theading_b.start()

执行结果：

theading_a 取钱成功
theading_a 余额 200
theading_b 取钱成功
theading_b 余额 -600

如果代码不做任何处理，该bug有时候出现，有时候不出现。如果代码中刚好进行了远程调用，或者sleep的话，那么该bug一定会出现。

加GIL锁：

import threading
import timelock = threading.Lock()class Account:def __init__(self, balance):self.balance = balancedef draw(account, amount):"""取钱"""with lock:if account.balance >= amount:# sleep 一定会导致线程阻塞和线程切换print(threading.current_thread().name, '取钱成功')account.balance -= amountprint(threading.current_thread().name, '余额', account.balance)else:print(threading.current_thread().name, '余额不足！')account = Account(1000)theading_a = threading.Thread(name='theading_a', target=draw, args=(account, 800))
theading_b = threading.Thread(name='theading_b', target=draw, args=(account, 800))theading_a.start()
theading_b.start()

执行结果：

theading_a 取钱成功
theading_a 余额 200
theading_b 余额不足！

八.好用的线程池 ThreadPoolExecutor

1.线程池的原理

新建线程系统需要分配资源、终止线程系统需要回收资源。

当系统中有大量的线程需要使用的时候，就会频繁的新建和终止线程，就会有很多的时间开销和线程的开销。

如果可以重用线程，则可以减去新建线程/终止线程的开销。

线程池的流转：

线程池里面是提前预先建好的线程。这些线程会被重复的使用。
任务队列，当一个新任务来的时候，并不是直接创建一个线程，而是放入任务队列中。
线程池里面的线程已经空闲的线程会依次取出任务进行执行，执行任务完成之后，会取下一个任务进行执行，如果没有任务线程会回到线程池但是并不会销毁，在线程池中等待下一个任务的到来。.
通过可重用的线程和任务队列实现了线程池。

在这里插入图片描述

2.使用线程池的好处

提升性能：因为减去了大量新建、终止线程的开销，重用了线程资源。
适用场景：适合处理突发性大量请求或需要大量线程完成任务、但实际任务处理实际较短。
防御功能：能有效避免系统因为创建线程过多，而导致系统负荷过大相应变慢等问题。
代码优势：使用线程池的语法比自己新建线程执行线程更加简洁。

3.使用线程池改造爬虫程序

TreadPoolExecutor的使用语法

from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed

方法一：map函数，很简单，注意map的结果和入参是顺序对应的

with ThreadPoolExecutor() as pool:results = pool.map(craw, urls)for resutl in results:print(result)

方法二：future模式，更强大。注意如果用as_completed顺序是不定的

with ThreadPoolExecutor() as pool:futures = [pool.submit(craw, url) for url in urls]for future in futures:print(future.result())for future in as_completed(futures):print(future.result())

线程池爬虫完整代码：

import requests
import threadingfrom concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completedfrom bs4 import BeautifulSoupurls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 51)]def craw(url):res = requests.get(url)return res.textdef parse(html):soup = BeautifulSoup(html, 'html.parser')links = soup.find_all('a', class_='post-item-title')return [(link["href"], link.get_text()) for link in links]with ThreadPoolExecutor() as pool:htmls = pool.map(craw, urls)htmls = list(zip(urls, htmls))for url, html in htmls:print(url, len(html))with ThreadPoolExecutor() as pool:futures = {}for url, html in htmls:future = pool.submit(parse, html)futures[future] = urlfor future, url in futures.items():print(url, future.result())

九.在Web服务中，使用线程池加速

1. Web服务架构以及特点

在这里插入图片描述

Web后台服务的特点：

Web服务对响应时间要求非常高，比如要求200ms返回响应。
Web服务有大量的以来IO操作的调用，比如磁盘文件、数据库、远程API。
Web服务经常需要处理几万、几百万的同时请求。

2.使用线程池ThreadPoolExecutor加速

面对大量的请求，不能够无限制的创建线程，因为线程会消耗资源。

使用线程池的ThreadPoolExecutor的好处：

方便的将磁盘文件、数据库、远程API的IO调用并发执行
线程池的线程数目不会无限创建（导致系统挂掉），具有防御功能。

3.代码用Flask实现Web服务并实现加速

原始版本:

import flask
import json
import timeapp = flask.Flask(__name__)def read_file():time.sleep(0.1)return 'file result'def read_db():time.sleep(0.2)return 'db result'def read_api():time.sleep(0.3)return 'api result'@app.route('/')
def index():result_file = read_file()result_db = read_db()result_api = read_api()return json.dumps({'result_file': result_file,'result_db': result_db,'result_api': result_api,})if __name__ == '__main__':app.run()

改造版本:

import flask
import json
import time
from concurrent.futures import ThreadPoolExecutorapp = flask.Flask(__name__)pool = ThreadPoolExecutor()def read_file():time.sleep(0.1)return 'file result'def read_db():time.sleep(0.2)return 'db result'def read_api():time.sleep(0.3)return 'api result'@app.route('/')
def index():result_file = pool.submit(read_file)result_db = pool.submit(read_db)result_api = pool.submit(read_api)return json.dumps({'result_file': result_file.result(),'result_db': result_db.result(),'result_api': result_api.result(),})if __name__ == '__main__':app.run()

十.使用多进程multiprocessing加速程序的运行

1.有了多线程threading，为什么还要用多进程multiprocessing

虽然有全局解释器锁GIL，但是因为有IO的存在，多线程依然可以加速运行。
在这里插入图片描述

CPU密集型计算线程的自动切换反而变成了负担，多线程甚至减慢了运行速度。
在这里插入图片描述

multiprocessing模块就是Python为了解决GIL缺陷引入的一个模块，原理是多进程在多CPU上并行执行。

2.多进程multiprocessing知识梳理（对比多线程threading）

语法条目	多线程	多进程
引入模块	from threading import Thread	from multiprocessing import Process
新建	t = Thread(target=func, args=(100, ))	p = Process(target=f, args=(‘bob’, ))
启动	t.start()	p.start()
等待结束	t.join()	p.join()
数据通信	import queue q = queue.Queue() q.put(item) item = q.get()	from multiprocessing import Queue q = Queue() q.put(item) item = q.get()
线程安全加锁	from threading import Lock lock = Lock() with lock: # do something	from multiprocessing import Lock lock = Lock() with lock: # do something
池化技术	from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor() as executor: # 方法一 results = executor.map(func, [1,2, 3]) # 方法二 result = future.result()	from concurrent.futures import ProcessPoolExecutor with ProcessPoolExecutor() as executor: # 方法一 results = executor.map(func, [1, 2, 3]) # 方法二 results = future.result()

3.代码实战：单线程、多线程、多进程对比CPU密集计算速度

CPU密集型计算：100次"判断大数字是否是素数"的计算
在这里插入图片描述

由于GIL的存在，多线程比单线程计算的还慢，而多进程可以明显加快执行速度。

import mathfrom concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor# 定义一个重复的素数列表
PRIMES = [112272535095293] * 100def is_prime(n):"""一个数字除了1和自身整除不能被其他数字整除"""if n < 2:return Falseelif n == 2:return Trueelif n % 2 == 0:return Falsesqrt_n = int(math.floor(math.sqrt(n)))for i in range(3, sqrt_n+1, 2):if n % i == 0:return Falseelse:return True

def single_thread():"""单线程"""for number in PRIMES:is_prime(number)%%time
single_thread()
# CPU times: total: 54.4 s
# Wall time: 1min 8s

def multi_thread():"""多线程"""with ThreadPoolExecutor() as pool:pool.map(is_prime, PRIMES)%%time
multi_thread()
# CPU times: total: 43.8 s
# Wall time: 1min 8s

def multi_process():"""多进程"""with ProcessPoolExecutor() as pool:pool.map(is_prime, PRIMES)%%time
multi_process()
# CPU times: total: 15.6 ms
# Wall time: 119 ms

十一、在Flask服务中使用进程池加速

在前面我们提到多线程应用于IO型的应用，而多进程可以加速CPU密集型的计算。Flask Web服务是一种特殊的场景，在这种场景中，我们大部分情况下使用多线程加速就可以了。但有些应用，也会遇到CPU密集型的计算，那么怎么在Flask Web服务中使用进程池来加速是一个问题。

import math
import json
from concurrent.futures import ProcessPoolExecutorimport flaskapp = flask.Flask(__name__)def is_prime(n):"""一个数字除了1和自身整除不能被其他数字整除"""if n < 2:return Falseelif n == 2:return Trueelif n % 2 == 0:return Falsesqrt_n = int(math.floor(math.sqrt(n)))for i in range(3, sqrt_n + 1, 2):if n % i == 0:return Falseelse:return True@app.route('/is_prime/')
def api_is_prime(numbers):print(numbers)number_list = [int(x) for x in numbers.split(',')]results = process_pool.map(is_prime, number_list)return json.dumps(dict(zip(number_list, results)))if __name__ == '__main__':process_pool = ProcessPoolExecutor()app.run()

多进程和多线程的一个区别：多进程它们的环境之间都是相互完全隔离的。就是当定义这个pool的时候，它所依赖的函数必须都已经声明完成了。

process_pool必须放在所有申明函数的最下面才能正常使用，还需要将process_pool的定义放到main函数里面。

在以上的尝试和对比中，我们也看到对于多线程的使用其实非常灵活，定义在哪里都没有问题，因为它们共享当前进程的所有的环境但是多进程这里在使用过程中就遇到了一些问题，有时候需要查资料进行解决。所以说大部分情况下只使用多线程就ok了，真的遇到了CPU密集型的计算，想办法引入多进程解决问题。在flask程序中使用多进程的方式就在main函数里面，在app.run()之前初始化进程池。然后在所有的函数里面就可以使用这个进程池。

十二、Python异步IO实现并发爬虫

1.单线程爬虫的执行路径

在这里插入图片描述

2.协程：单线程内实现并发

核心原理：用一个超级循环（其实就是while true）循环，这个超级循环是可以自己控制的。

核心原理：配合IO多路复用原理（IO时CPU可以干其他事情）
在这里插入图片描述

《The one loop》

至尊循环驭众生

至尊循环寻众生

至尊循环引众生

普照众生欣欣荣

3.Python异步IO库介绍：asyncio

async：异步

io：输入输出

import asyncio# RuntimeError: This event loop is already running 出现该问题时导入nest_asyncio解决
# import nest_asyncio
# nest_asyncio.apply()urls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 51)]# 获取事件循环（里面就是一个while true）
loop = asyncio.get_event_loop()async def get_url(url):await asyncio.sleep(1)print(url)# 定义协程 async说明这个函数是一个协程
async def myfunc(url):# await 非常重要 对应IO 进行到这个一步骤时，不进行阻塞，而是让超级循环进入下一个程序的执行await get_url(url)# 创建task列表
tasks = [loop.create_task(myfunc(url)) for url in urls]# 执行爬虫事件列表
loop.run_until_complete(asyncio.wait(tasks))

import asyncio
import nest_asyncionest_asyncio.apply()urls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 51)]async def get_url(url):await asyncio.sleep(1)print(url)# 定义协程 async说明这个函数是一个协程
async def myfunc(url):# await 非常重要 对应IO 进行到这个一步骤时，不进行阻塞，而是让超级循环进入下一个程序的执行await get_url(url)# 创建task列表
tasks = [asyncio.create_task(myfunc(url)) for url in urls]# 执行爬虫时间列表
asyncio.wait(tasks)

注意：

要用在异步IO编程中，依赖的库必须支持异步IO特性
爬虫应用中：requests不支持异步，需要用aiohttp

import asyncio
import timeimport aiohttpurls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 31)]async def async_craw(url):async with aiohttp.ClientSession() as session:async with session.get(url) as res:result = await res.text()print('craw url: {} {}'.format(url, len(result)))loop = asyncio.get_event_loop()tasks = [loop.create_task(async_craw(url)) for url in urls]start_time = time.time()
loop.run_until_complete(asyncio.wait(tasks))
print(time.time() - start_time)

十三、在异步IO中使用信号量控制爬虫并发度

信号量（Semaphore）

信号量（Semaphore）又称为信号、旗语，是一个同步对象，用于保持0至指定最大值之间的一个计数值。

当线程完成一次对该semaphore对象的等待(wait)时，该计数值减一
当线程完成一次对semaphore对象的释放(release)时，计数值加一
当计数值为0，则线程等待该semaphore对象不再能成功直至该semaphore对象编程signaled状态
semaphore对象的计数值大于0，为signaled状态，计数值等于0，为nosignaled状态。

使用方式一：

sem = asyncio.Semaphore(10)# ...later
# 可用保证并发度处于指定的数量之内
async with sem:# work with shared resoure

使用方式二：

sem = asyncio.Semaphore(10)# ...later
await sem.acquire()
try:# work with shared resoure
finally:sem.release()

实例：

import asyncio
import timeimport aiohttpurls = [f'https://www.cnblogs.com/#p{page}' for page in range(1, 31)]# 设置并发度为10
semaphore = asyncio.Semaphore(10)async def async_craw(url):async with semaphore:print('craw url: ', url)async with aiohttp.ClientSession() as session:async with session.get(url) as res:result = await res.text()# 这里休眠看执行情况 会看到这里是10个执行完成之后，接着又执行10个await asyncio.sleep(5)print('craw url: {} {}'.format(url, len(result)))loop = asyncio.get_event_loop()tasks = [loop.create_task(async_craw(url)) for url in urls]start_time = time.time()
loop.run_until_complete(asyncio.wait(tasks))
print(time.time() - start_time)

十四、使用subprocess启动电脑任意程序，听歌、解压缩、自动下载等

1.使用subprocess启动电脑的子进程

subproces模块：

允许生成新的进程
连接它们的输入、输出、错误管道
并且获取它们的返回码

应用场景：

每天定时08：00自动打开酷狗音乐播放歌曲
调用7z.exe自动解压.7z文件
通过Python远程提交一个torrent种子文件，用电脑启动下载

2.subprocess的实例

用默认的应用程序打开歌曲文件

注：windows下是start、macOs是open、Linux是see

# windows环境需要加shell=True
proc = subprocess.Popen(['start', 'xxx.mp3'], shell=True)proc.communicate()

用7z.exe解压7z压缩文件

proc = subprocess.Popen([r'C:\Program Files\7-Zip\7z.exe', 'x', './data/7z_test.7z', '-o ./datas/exetract_7z_test', '-aoa'], shell=True)proc.communicate()

词库加载错误:未能找到文件“E:\highferrum_mysql\Configuration\Dict_Stopwords.txt”。

上一篇：【单片机】矩阵键盘/定时器

下一篇：ChatGPT强悍的编程能力，让我吓出一身冷汗！

Python 并发编程

一.Python 对并发编程的支持

二.怎样选择多进程多线程多协程

Python 并发编程有三种方式

1.什么是CPU密集型计算、IO密集型计算？

三.Python 全局解释器锁GIL

1.Python速度慢的两大原因

2.GIL是什么？

3.为什么有GIL?

四.怎样规避GIL带来的限制？

五.利用多线程，Pyhton爬虫被加速10倍数

1.Python创建多线程的方法

2.改写爬虫程序，变成多线程爬取

3.速度对比：单线程爬虫VS多线程爬虫

六.Python实现生产者消费者爬虫

1.多组件的Pipeline技术架构

2.生产者消费者爬虫的架构

3.多线程数据通信的queue.Queue

4.代码编写二实现生产者消费者爬虫

七.线程安全问题以及Lock解决方案

1. 线程安全概念介绍

2.Lock 用于解决线程安全问题

3.示例代码解决问题以及解决方案

八.好用的线程池 ThreadPoolExecutor

1.线程池的原理

2.使用线程池的好处

3.使用线程池改造爬虫程序

TreadPoolExecutor的使用语法

方法一：map函数，很简单，注意map的结果和入参是顺序对应的

方法二：future模式，更强大。注意如果用as_completed顺序是不定的

九.在Web服务中，使用线程池加速

1. Web服务架构以及特点

2.使用线程池ThreadPoolExecutor加速

3.代码用Flask实现Web服务并实现加速

十.使用多进程multiprocessing加速程序的运行

1.有了多线程threading，为什么还要用多进程multiprocessing

2.多进程multiprocessing知识梳理（对比多线程threading）

3.代码实战：单线程、多线程、多进程对比CPU密集计算速度

十一、在Flask服务中使用进程池加速

十二、Python异步IO实现并发爬虫

1.单线程爬虫的执行路径

2.协程：单线程内实现并发

3.Python异步IO库介绍：asyncio

十三、在异步IO中使用信号量控制爬虫并发度

信号量（Semaphore）

十四、使用subprocess启动电脑任意程序，听歌、解压缩、自动下载等

1.使用subprocess启动电脑的子进程

2.subprocess的实例

相关内容

热门资讯