Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3667302
  • 博文数量: 365
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 2522
  • 用 户 组: 普通用户
  • 注册时间: 2019-10-28 13:40
文章分类

全部博文(365)

文章存档

2023年(8)

2022年(130)

2021年(155)

2020年(50)

2019年(22)

我的朋友

分类: Python/Ruby

2021-10-12 17:16:35

# -*- coding: utf-8 -*-

import scrapy

import time

import re

import json

import jsonpath

import urllib.parse

from Suning.items import SuningItem

class SuningSpider(scrapy.Spider):

    name = 'suning'

    allowed_domains = ['search.suning.com/']

    keyword = input("请输入商品:")

    temp_data = urllib.parse.quote(keyword)

    temp_url = "{}/"

    val_url = temp_url.format(temp_data)

    start_urls = [val_url]

    def __init__(self, name=None, **kwargs):

        super().__init__(name=None, **kwargs)

        self.page_num = 0

    def parse(self, response):

        # content = response.body.decode("utf-8")

        # with open("./file/苏宁.html", "w", encoding="utf-8") as file:

        #     file.write(content)

        li_elements = response.xpath("//div[@id='product-list']/ul[@class='general clearfix']/li")

        # print(len(li_elements))

        for li_element in li_elements:

            title_elements = li_element.xpath(

                ".//div[@class='res-info']/div[@class='title-selling-point']/a//text()").extract()

            title_list = []

            for temp_title in title_elements:

                temp_title = re.sub(r"\s", "", temp_title)

                if len(temp_title) > 0:

                    temp_title = temp_title.replace(",", "")

                    title_list.append(temp_title)

            title = "-".join(title_list)

            store_name = li_element.xpath(

                ".//div[@class='res-info']/div[@class='store-stock']/a/@title").extract_first()

            # print(store_name)

            # print(title)

            temp_image_url = li_element.xpath(

                ".//div[@class='img-block']/a[@class='sellPoint']/img/@src").extract_first()

            image_url = "https:" + temp_image_url

            # print(image_url)

            temp_product_url = li_element.xpath(

                ".//div[@class='img-block']/a[@class='sellPoint']/@href").extract_first()

            src_args = re.findall(r"com/(.*?).html", temp_product_url)[0]

            key0 = src_args.split("/")[0]

            key1 = src_args.split("/")[-1]

            price_src =外汇跟单 "" + key1 + "_0000000" + key1 + "_" + key0 + "_190_755_7550199_500353_1000051_9051_10346_Z001___R9006372_0.91_1___00031F072____0___750.0_2__500363_500519__.html?callback=pcData&_=1630468559926"

            # price_src = "" + key1 + "_0000000" + key1 + "_" + key0 + "_250_029_0290199_20089_1000257_9254_12006_Z001___R1901001_0.5_0___000060864___.html?callback=pcData&_=1630466740130"

            # print(price_src)

            item = {"title": title, "store_name": store_name, "image_url": image_url}

            yield scrapy.Request(price_src, callback=self.get_price, dont_filter=True, meta=item)

阅读(1070) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~