-
-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Describe the bug
I'm getting the cookie through Puppeteer and saving it in a cookies.json file. After that, I load the cookies and send them in string format to the scrapper (please, if there is another way to do this, let me know).
I'm using the scrapper as a recursive. In the first request, the page loads normally and the cookie works, and loads the logged in panel. In the second request, the cookie no longer works, and displays the page saying that I am not logged in.
Note: I'm trying to clone a WordPress site, where the logged in area also uses WooCommerce (I believe it's irrelevant to the scrapper, but it's just an observation).
Expected behavior
The cookie should work for all requests, but it only works for the first one.
Configuration
- version
"website-scraper": "^5.3.1",
"website-scraper-puppeteer": "^1.1.0"
My code
import scrape from 'website-scraper';
import PuppeteerPlugin from 'website-scraper-puppeteer';
import puppeteer from 'puppeteer';
import fs from 'fs';
const pup = {
headless: false,
slowMo: 50,
args: ['--no-sandbox', '--disable-setuid-sandbox', '--start-maximized'],
defaultViewport: null,
executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
};
function formatCookies(cookies) {
return cookies.map(cookie => `${cookie.name}=${cookie.value}`).join('; ');
}
(async () => {
const browser = await puppeteer.launch(pup);
const page = await browser.newPage();
await page.goto('https://example.com/login/');
console.log('Press enter after login');
process.stdin.resume();
await new Promise(resolve => process.stdin.once('data', resolve));
const getCookies = await page.cookies();
fs.writeFileSync('cookies.json', JSON.stringify(getCookies, null, 2));
console.log('Cookies saved!');
await browser.close();
const cookies = JSON.parse(fs.readFileSync('cookies.json'));
const cookieString = formatCookies(cookies);
const options = {
urls: [
'https://example.com/admin/',
],
directory: `./site`,
plugins: [
new PuppeteerPlugin({
launchOptions: pup,
})
],
request: {
headers: {
Cookie: cookieString,
}
},
recursive: true,
urlFilter: function(url) {
return url.indexOf('https://example.com') === 0;
},
};
await scrape(options);
console.log('Clone finished!');
})();
Steps to reproduce
- Change URL to correct website
- Run the script on terminal "node index.js"
- Navigate to login website, do login manually, back to terminal and press "enter"
- The clone will start with the Cookies and logged in