>

.select() 를 사용하고 있습니다  BeautifulSoup을 사용하여 왜 예상 결과의 일부만 반환되는지 잘 모르겠습니다.

HTML의 형식은

<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>
<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>
<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>

코드 :

soup = BeautifulSoup(html, 'lxml')
item_urls = soup.select(".css-ix8km1")

12 만 반환  36 개의 상품이 반품 될 것으로 예상되는 경우


  • 답변 # 1

    cody가 이미 언급했듯이 셀레늄과 같은 메커니즘을 사용해야합니다. 페이지를 시험해 보았고 다음 코드로 출력을 얻을 수있었습니다. 페이지를 적용하기 전에 'X'버튼을 클릭하여 팝업 광고를 닫아야합니다.

    import time
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import selenium
    driver = webdriver.Chrome(executable_path='/home/bitto/chromedriver') #change this
    driver.get("https://www.sephora.com/shop/face-makeup?pageSize=300")
    #to close the popup ad
    try:
        element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, "//button[@class='css-1mfnet7 ']"))
        )
        element.click()
    except selenium.common.exceptions.TimeoutException:
        print("Ad was not found")
    time.sleep(1) #not preferred but will do for now
    elem = driver.find_element_by_tag_name("body")
    item_urls=[]
    no_of_pagedowns = 3
    while no_of_pagedowns:
        elem.send_keys(Keys.PAGE_DOWN)
        time.sleep(5) #not preferred but will do for now
        no_of_pagedowns-=1
    post_elems =driver.find_elements_by_xpath("//a[@class='css-ix8km1']")
    for elem in post_elems:
        item_urls.append(elem.get_attribute("href"))
    print(item_urls)
    
    

    오 푸트

    ['https://www.sephora.com/product/pro-filtr-soft-matte-longwear-foundation-P87985432?icid2=products%20grid:p87985432:product', 'https://www.sephora.com/product/pro-filt-r-instant-retouch-concealer-P88779809?icid2=products%20grid:p88779809:product', 'https://www.sephora.com/product/radiant-creamy-concealer-P377873?icid2=products%20grid:p377873:product', 'https://www.sephora.com/product/translucent-loose-setting-powder-P109908?icid2=products%20grid:p109908:product', 'https://www.sephora.com/product/pro-filt-r-instant-retouch-setting-powder-P88779810?icid2=products%20grid:p88779810:product', 'https://www.sephora.com/product/diamond-bomb-all-over-diamond-veil-P85225585?icid2=products%20grid:p85225585:product', 'https://www.sephora.com/product/the-silk-canvas-P428661?icid2=products%20grid:p428661:product', 'https://www.sephora.com/product/pineapple-my-eye-collector-s-set-P435947?icid2=products%20grid:p435947:product', 'https://www.sephora.com/product/double-wear-stay-in-place-makeup-P378284?icid2=products%20grid:p378284:product', 'https://www.sephora.com/product/ultra-hd-invisible-cover-foundation-P398321?icid2=products%20grid:p398321:product', 'https://www.sephora.com/product/all-nighter-long-lasting-makeup-setting-spray-P263504?icid2=products%20grid:p263504:product', 'https://www.sephora.com/product/your-skin-but-better-cc-cream-spf-50-P411885?icid2=products%20grid:p411885:product', 'https://www.sephora.com/product/luminous-silk-foundation-P393401?icid2=products%20grid:p393401:product', 'https://www.sephora.com/product/born-this-way-P397517?icid2=products%20grid:p397517:product', 'https://www.sephora.com/product/born-this-way-super-coverage-multi-use-sculpting-concealer-P432298?icid2=products%20grid:p432298:product', 'https://www.sephora.com/product/lock-it-tattoo-foundation-P311138?icid2=products%20grid:p311138:product', 'https://www.sephora.com/product/fresh-face-kit-P440030?icid2=products%20grid:p440030:product', 'https://www.sephora.com/product/teint-idole-ultra-24h-long-wear-foundation-P308201?icid2=products%20grid:p308201:product', 'https://www.sephora.com/product/fauxfilter-foundation-P424302?icid2=products%20grid:p424302:product', 'https://www.sephora.com/product/creaseless-concealer-P433206?icid2=products%20grid:p433206:product', 'https://www.sephora.com/product/bareminerals-original-foundation-broad-spectrum-spf-15-P61003?icid2=products%20grid:p61003:product', 'https://www.sephora.com/product/shimmering-skin-perfector-pressed-P381176?icid2=products%20grid:p381176:product', 'https://www.sephora.com/product/tinted-moisturizer-broad-spectrum-P109936?icid2=products%20grid:p109936:product', 'https://www.sephora.com/product/veil-mineral-primer-P210575?icid2=products%20grid:p210575:product']
    
    

  • 답변 # 2

    이유는 처음 12 개 항목 만 응답으로 렌더링되고 나머지는 사이트의 자바 스크립트 코드를 통해 느리게로드됩니다. 이것은 curl 로 해당 URL을 요청하여 확인할 수 있습니다.  클래스 문자열의 인스턴스 수를 계산합니다.

    $ curl -s 'https://www.sephora.com/shop/face-makeup?pageSize=300' | grep -o css-ix8km1 | wc -l
    13
    
    

    Selenium WebDriver와 같은 자바 스크립트를 실행하는 메커니즘을 사용해야 할 수도 있습니다.

  • 이전 yaml - 사전 및 루프를 사용하는 Ansible Playbook의 구문 오류
  • 다음 python - 동일한 데이터 프레임에서 2 개의 다른 열 행 값을 기반으로하는 조건이 충족되는 경우 데이터 프레임에서 열의 행에 값 채우기