Skip to content

Cookies are becoming invalid on subsequent requests #115

@rafaelfndev

Description

@rafaelfndev

Describe the bug
I'm getting the cookie through Puppeteer and saving it in a cookies.json file. After that, I load the cookies and send them in string format to the scrapper (please, if there is another way to do this, let me know).

I'm using the scrapper as a recursive. In the first request, the page loads normally and the cookie works, and loads the logged in panel. In the second request, the cookie no longer works, and displays the page saying that I am not logged in.

Note: I'm trying to clone a WordPress site, where the logged in area also uses WooCommerce (I believe it's irrelevant to the scrapper, but it's just an observation).

Expected behavior
The cookie should work for all requests, but it only works for the first one.

Configuration

  • version
    "website-scraper": "^5.3.1",
    "website-scraper-puppeteer": "^1.1.0"

My code

import scrape from 'website-scraper';
import PuppeteerPlugin from 'website-scraper-puppeteer';
import puppeteer from 'puppeteer';
import fs from 'fs';

const pup = {
	headless: false,
	slowMo: 50,
	args: ['--no-sandbox', '--disable-setuid-sandbox', '--start-maximized'],
	defaultViewport: null,
	executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
};

function formatCookies(cookies) {
	return cookies.map(cookie => `${cookie.name}=${cookie.value}`).join('; ');
}

(async () => {
	
	const browser = await puppeteer.launch(pup);
	const page = await browser.newPage();
	await page.goto('https://example.com/login/');

	console.log('Press enter after login');
	
	process.stdin.resume();
	await new Promise(resolve => process.stdin.once('data', resolve));

	const getCookies = await page.cookies();
	fs.writeFileSync('cookies.json', JSON.stringify(getCookies, null, 2));

	console.log('Cookies saved!');

	await browser.close();

	const cookies = JSON.parse(fs.readFileSync('cookies.json'));

	const cookieString = formatCookies(cookies);

	const options = {
		urls: [
			'https://example.com/admin/',
		],
		directory: `./site`,
		plugins: [
			new PuppeteerPlugin({
				launchOptions: pup,
			})
		],
		request: {
			headers: {
				Cookie: cookieString,
			}
		},
		recursive: true,
		urlFilter: function(url) {
			return url.indexOf('https://example.com') === 0;
		},
	};

	await scrape(options);

	console.log('Clone finished!');
})();

Steps to reproduce

  1. Change URL to correct website
  2. Run the script on terminal "node index.js"
  3. Navigate to login website, do login manually, back to terminal and press "enter"
  4. The clone will start with the Cookies and logged in

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions