logo
down
shadow

How to run both items in scrapy function?


How to run both items in scrapy function?

By : Levis
Date : October 24 2020, 06:10 PM
wish of those help Whenever I use the link of captions and transcription in start_urls variable, it gives me the price of caption in both captions and transcription variable and again give me the price of transcription in both variables. Why and how to solve this issue? , I suspect that you need another structure of class, sequential:
code :
import scrapy
from .. items import FetchingItem

class SiteFetching(scrapy.Spider):
    name = 'Site'
    start_urls = ['https://www.rev.com/freelancers/captions']

    def parse(self, response):
        items = FetchingItem()
        items['Caption_price'] = response.css('#middle-benefit .mt1::text').extract()
        yield Request('https://www.rev.com/freelancers/transcription', self.parse_transcription, meta={'items': items})

    def parse_transcription(self, response):
        items = response.meta['items']
        items['Transcription_price'] = response.css('#middle-benefit .mt1::text').extract()
        yield items


Share : facebook icon twitter icon
Scrapy--Can not import the items to my spider (No module name behance.items)

Scrapy--Can not import the items to my spider (No module name behance.items)


By : calceus
Date : March 29 2020, 07:55 AM
will help you i'm new to scrapy and when running the spider to crawl behance , Try running your spider by using this command:
code :
scrapy crawl behance
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse

from scrapy.crawler import CrawlerProcess

class BehanceSpider(scrapy.Spider):
    name = "behance"
    allowed_domains = ["behance.com"]
    start_urls = [

    "https://www.behance.net/gallery/29535305/Mind-Your-Monsters",


]


    def __init__ (self):
        self.driver = webdriver.Firefox()

    def parse(self, response):

        self.driver.get(response.url)
        response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
        item = BehanceItem()
        hxs = Selector(response)

        item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()

        yield   item
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl("behance")
process.start()
Python scrapy - yield initial items and items from callback to csv

Python scrapy - yield initial items and items from callback to csv


By : Alvino Christian
Date : March 29 2020, 07:55 AM
I hope this helps . You are yielding two different kinds of items - one containing just video attribute and one containing just transcript attribute. You have to yield one kind of item composed of both attributes. For that, you have to create item in parse and pass it to second level request using meta. Then, in the parse_transcript, you take it from meta, fill additional data and finally yield the item. The general pattern is described in Scrapy documentation.
The second thing is that you extract all videos at once using extract() method. This yields a list where it's hard afterwards to link each individual element with corresponding transcript. Better approach is to loop over each individual video element in the HTML and yield item for each video.
code :
import scrapy

class SuhbaSpider(scrapy.Spider):
    name = "suhba2"
    start_urls = ["http://saltanat.org/videos.php?topic=SheikhBahauddin&gopage={numb}".format(numb=numb) for numb in range(1,3)]

    def parse(self, response):
        for video in response.xpath("//tr[@class='video-doclet-row']"):
            item = dict()
            item["video"] = video.xpath(".//span[@class='download make-cursor']/a/@href").extract_first()

            videoid = video.xpath(".//span[@class='media-info make-cursor']/@onclick").extract_first()
            url = "http://saltanat.org/ajax_transcription.php?vid=" + videoid[21:-2]
            request = scrapy.Request(url, callback=self.parse_transcript)
            request.meta['item'] = item
            yield request

    def parse_transcript(self, response):
        item = response.meta['item']
        item["transcript"] = response.xpath("//a[contains(@href,'english')]/@href").extract_first()
        yield item
How To Keep/Export Field Items in Specific Order Per Spider Class Definition, Utilizing The Items Pipeline in Scrapy

How To Keep/Export Field Items in Specific Order Per Spider Class Definition, Utilizing The Items Pipeline in Scrapy


By : asif
Date : March 29 2020, 07:55 AM
help you fix your problem This is the solution to my specific problem: export fields organized per the items class definition as defined in the items.py of a scrapy spider project.
So after tinkering with this problem and implementing @stranac's suggestion of getting rid of the list comprehension, I came up with the following solution, allowing to export all fields in order into their relative csv files:
code :
from scrapy.exporters import CsvItemExporter
from scrapy import signals
from pydispatch import dispatcher


def item_type(item):
    # just want "first_class_def.csv" not "first_class_def_Item.csv"
    return type(item).__name__.replace('_Item','')

class SomeSitePipeline(object):
    fileNamesCsv = ['first_class_def','second_class_def']

    def __init__(self):
        self.files = {}
        self.exporters = {}
        dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)

    def spider_opened(self, spider):
        self.files = dict([ (name, open("/somefolder/"+name+'.csv','wb')) for name in self.fileNamesCsv ])
        for name in self.fileNamesCsv:
            self.exporters[name] = CsvItemExporter(self.files[name])

            if name == 'first_class_def':
                self.exporters[name].fields_to_export = ['f1','f2','f3']
                self.exporters[name].start_exporting()

            if name == 'second_class_def':
                self.exporters[name].fields_to_export = ['f1','f4','f5','f6']
                self.exporters[name].start_exporting()

    def spider_closed(self, spider):
        [e.finish_exporting() for e in self.exporters.values()]
        [f.close() for f in self.files.values()]

    def process_item(self, item, spider):
        typesItem = item_type(item)
        if typesItem in set(self.fileNamesCsv):
            self.exporters[typesItem].export_item(item)
        return item
How to remove items name in scrapy function?

How to remove items name in scrapy function?


By : Timothy Trimmier
Date : March 29 2020, 07:55 AM
will be helpful for those in need Your code never reaches the method self.next_parse. By default Scrapy calls the callback self.parse to each URL in self.start_urls. You can use a customized callback by overriding the method start_requests.
Here is how you do it:
code :
import scrapy
from .. items import FetchingItem
import re

class SiteFetching(scrapy.Spider):
    name = 'Site'

    def start_requests(self):
        return [
            scrapy.Request('https://www.rev.com/freelancers/transcription', callback=self.parse_transcription),
            scrapy.Request('https://www.rev.com/freelancers/captions', callback=self.parse_caption)
        ]

    def parse_transcription(self, response):
        items = FetchingItem()
        Transcription_price = response.css('#middle-benefit .mt1::text').extract()

        items['Transcription_price'] = Transcription_price
        yield items

    def parse_caption(self, response):
        other_items = FetchingItem()
        Caption_price = response.css('#middle-benefit .mt1::text').extract()

        other_items['Caption_price'] = Caption_price
        yield other_items
Scrapy: How to load multiple items in separate function?

Scrapy: How to load multiple items in separate function?


By : Ivan Ottinger
Date : March 29 2020, 07:55 AM
this one helps. I would like to do the processing and loading of items in a function separated from parse_product which is the prepare_item_download() function in my case. However, when I run my spider, I get the error message that it needs to return a Request, BaseItem, dict or None and not a generator. It works when I leave it in the parse_product function. , You parse_product function generates a generator:
code :
def parse_product(self, response):
    path = response.request.url.split('/')[:-1]
    if path[-1] == 'fritz.os':
        yield self.prepare_item_download(response, path)
        ^^^^^^^^^^^^^
def parse_product(self, response):

    path = response.request.url.split('/')[:-1]
    if path[-1] == 'fritz.os':
        yield from self.prepare_item_download(response, path)
              ^^^^
        # or for python <3.3
        for item in self.prepare_item_download(response, path):
            yield item
Related Posts Related Posts :
  • Percent signs in windows path
  • How to add a random number to a subsection of a numpy array?
  • How to generate all the values of an iterable besides the last few?
  • Searching by both class and range in XPath
  • Python code execution in Perl interpreter
  • Best Way to Include Variable in Python3
  • Serialize the @property methods in a Python class
  • What is the most platform- and Python-version-independent way to make a fast loop for use in Python?
  • Good way to edit the previous defined class in ipython
  • Bounced email on Google App Engine
  • Search jpeg files using python
  • Dynamically create class attributes
  • python unichr problem
  • Python beginner, strange output problem
  • Python: Finding a value in 1 list and finding that corresponding index in another list
  • can't install mysqlclient on mac os x mojave
  • Error indicates flattened dimensions when loading pre-trained network
  • how to replace underlines with words?
  • Adding through iteration
  • Use OpenCV on deployed Flask app (Heroku)
  • How to skip interstitial in a django view if a user hits the back button?
  • Any Naive Bayesian Classifier in python?
  • Python 2.5.2: remove what found between two lines that contain two concrete strings
  • Python 2.5.2 script that add "The function starts here" to all the functions of the files of a directory
  • HttpResponseRedirect question
  • Python socket error on UDP data receive. (10054)
  • Encoding issues with cloud ml
  • numpy.where - Weird behaviour: new elements spawning from nowhere?
  • I can't move my player in pygame, can you figure why?
  • Weird error I receive from Tkinter in Python
  • Using a Zapier Custom Request Webhook with JSON Web Tokens
  • Keras: Use categorical_crossentropy without one-hot encoded array of targets
  • Does python's httplib.HTTPConnection block?
  • Do alternate python implementation version numbers imply that they provide the same syntax?
  • Searching for specific HTML string using Python
  • python sax error "junk after document element"
  • MySql: How to know if an entry is compressed or not
  • Return a list of imported Python modules used in a script?
  • Returning a list in this recursive coi function in python
  • Python2.6 Decimal to Octal
  • Appengine Apps Vs Google bot web crawler
  • Changing models in django results in broken database?
  • Global variable functions
  • Using lambda inside a function
  • How to open a file, replace some strings, and save updates to the same file?
  • How to move the beginning of an input to the and?
  • If else fill variable if empty list
  • Pandas: Find and print all floats in column
  • sqlite3.OperationalError: database is locked - non-threaded application
  • How to implement mib module in net-snmp with python?
  • Does Python/Scipy have a firls( ) replacement (i.e. a weighted, least squares, FIR filter design)?
  • sorl-thumbnail and file renaming
  • Python -- what is NOT in 2.7 that IS in 3.1? So many things have been back-ported, what is NOT?
  • How to make a Django model fields calculated at runtime?
  • Django - Threading in views without hanging the server
  • Python: Why is my POST requests not working?
  • Tried to add a value to a while condition, but it doesn't go back
  • How do I exit a while-true loop after 5 tries?
  • win python3 Multithreading
  • Compare 2 dictionaries in python
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk