All articles

Scrape Redfin.com using Python (Save Data in CSV)

Published Date Oct 28, 2024
Read 14 min
Scrape Redfin.com using Python (Save Data in CSV)

TL;DR

  • Python guide: scrape Redfin search & property pages with requests + BeautifulSoup; save CSV.

  • Search: price, config, address, broker, link. Property: price, address, availability, about.

  • Bonus: extract dynamic data from embedded JSON (reactServerState.InitialContext); skip headless.

  • Scale: use Scrapingdog (rotating proxies / CAPTCHAs); 1,000 free credits.

For businesses, investors, and even curious individuals, real-time insights into the housing market can be invaluable. Redfin, a prominent player in the real estate sector, offers a mine of such data, spanning across more than 100 markets in both the United States and Canada. With a commendable 0.80% market share in the U.S. (Wikipedia), as gauged by the number of units sold, and boasting a network of approximately 2,000 dedicated lead agents, Redfin stands as a significant source of real estate intelligence.

In this blog, we will see how we can scrape data from Redfin using Python, further, I will show you how you can scale this process.

Let’s start!!

Collecting all the Ingredients for Scraping Redfin

Assuming that you have already installed Python 3.x on your machine and if not then please install it from here. Once this is done create a folder in which we will keep our Python scripts.

1mkdir redfin
2cd redfin

Once you are inside your folder install these public Python libraries.

  • Requests - This will be used for making the HTTP connection with redfin.com. Using this library we will install the raw HTML of the target page.

  • BeautifulSoup– Using this we will parse the important data from the raw HTML downloaded using the requests library.

    `pip install requests pip install beautifulsoup4`

Now create a python file inside this folder where you can write the script. I am naming the file as redfin.py.

With this our project setup is complete and now we can proceed with the scraping.

What are we going to scrape?

In this tutorial, we are going to scrape two types of pages from redfin.com.

  1. Redfin Search Page

  2. Redfin Property Page

Scraping Redfin Search Page

It is always a great practice to decide in advance what data you want from the page. For this tutorial, we are going to scrape this page.

Scrape Redfin.com using Python (Save Data in CSV)

Download Raw HTML from the Page

Our first task would be to download the raw HTML from the target web page. For this, we are going to use the requests library.

1import requests
2from bs4 import BeautifulSoup
3
4
5l=[]
6o={}
7
8
9
10head={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
11target_url="https://www.redfin.com/city/30749/NY/New-York/filter/status=active"
12
13
14
15resp = requests.get(target_url,headers=head,verify=False)
16print(resp.status_code)

First, we imported all the libraries that we installed earlier. Then I declared one empty list and one empty object.

The head variable is a dictionary containing the User-Agent header. The target_url variable contains the URL of the webpage to be scraped.

The requests.get function is used to send an HTTP GET request to the specified URL (target_url). The headers parameter is set to include the User-Agent header from the head dictionary. The verify=False parameter disables SSL certificate verification. The response object (resp) contains the server’s response to the request.

Once you run this code and see a **200** on the logs then that means you have successfully scraped the target web page.

Now, we can parse the data using BS4.

Parsing the Raw HTML

BeautifulSoup will now help us extract all the data points from the raw HTML downloaded in the previous section. But before we start coding we have to identify the DOM location of each element.

We will use the Chrome developer tool to find the DOM location. If you inspect and analyze the design of the page then you will find that all the property box is inside the div tag with the class HomeCardContainer. So, first, we should find all these elements using find_all() method of BS4.

Scrape Redfin.com using Python (Save Data in CSV)
1soup=BeautifulSoup(resp.text,'html.parser')
2
3allBoxes = soup.find_all("div",{"class":"HomeCardContainer"})

The BeautifulSoup constructor is used to create a BeautifulSoup object (soup). The find_all method of the BeautifulSoup object is used to find all HTML elements that match the class HomeCardContainer.

allBoxes is a list that contains all the property data elements. Using for loop we are going to reach every property container and extract the details. But before we write our for loop let’s find the DOM location of each data point.

Let’s start with the property price.

Scrape Redfin.com using Python (Save Data in CSV)

Once you right-click on the price you will see that the price is stored inside the span tag with the class homecardV2Price.

Scrape Redfin.com using Python (Save Data in CSV)

Similarly, the configuration of the property can be found inside the div tag with class HomeStatsV2.

Scrape Redfin.com using Python (Save Data in CSV)

Individual property links can be found inside the a tag. This a tag is the only a tag inside each property container.

1for box in allBoxes:
2 try:
3 o["property-price"]=box.find("span",{"class":"homecardV2Price"}).text.strip()
4 except:
5 o["property-price"]=None
6
7 try:
8 o["property-config"]=box.find("div",{"class":"HomeStatsV2"}).text.strip()
9 except:
10 o["property-config"]=None
11
12 try:
13 o["property-address"]=box.find("div",{"class":"homeAddressV2"}).text.strip()
14 except:
15 o["property-address"]=None
16
17 try:
18 o["property-broker"]=box.find("div",{"class":"brokerageDisclaimerV2"}).text.strip()
19 except:
20 o["property-broker"]=None
21
22 try:
23 o["property-link"]="https://www.redfin.com"+box.find("a").get('href')
24 except:
25 o["property-link"]=None
26
27 l.append(o)
28 o={}
29
30
31print(l)

For each home card container, it extracts specific pieces of information, such as property price, configuration, address, broker details, and a link to the property.

for loop iterates through each element (box) in the list of home card containers. For each piece of information (property price, configuration, address, broker, link), a try block attempts to find the corresponding HTML element within the current home card container (box). If successful, it extracts the text content, strips leading and trailing whitespaces, and assigns it to the corresponding key in the dictionary (o). If the extraction fails (due to an attribute not being present or other issues), the except block sets the value to None.

After extracting information from the current home card container, the dictionary o is appended to the list l. Then, the dictionary o is reset to an empty dictionary for the next iteration.

Once you run this code you will get this response.

Scrape Redfin.com using Python (Save Data in CSV)

Saving the data to a CSV file

For better visibility of this data, we are going to save this data to a CSV file. For this task, we are going to use the pandas library.

1df = pd.DataFrame(l)
2df.to_csv('properties.csv', index=False, encoding='utf-8')

The code uses the pandas library to create a DataFrame (df) from the list of dictionaries (l) that contains the scraped data. After creating the DataFrame, it is then exporting the DataFrame to a CSV file named 'properties.csv'.

After running the code you will find a CSV file inside your working folder by the name properties.csv.

Scrape Redfin.com using Python (Save Data in CSV)

Saving the data from a list to a CSV file was super simple with Pandas.

Key Takeaways

  • The blog explains how Redfin real estate data (property listings, prices, locations, and details) can be extracted programmatically for analysis and research.

  • It highlights that Redfin actively blocks direct scraping attempts, making browser-like requests, headers, and anti-bot handling necessary.

  • The article shows how using a scraping API simplifies Redfin data extraction by managing IP rotation, request headers, and block avoidance automatically.

  • It demonstrates how structured property data can be collected at scale instead of relying on manual searches or copy-paste workflows.

  • The post emphasizes that scraped Redfin data is useful for real estate analytics, market research, pricing comparisons, and building property data tools.

Complete Code

You can scrape many more details from the page but for now, the code will look like this.

1import requests
2from bs4 import BeautifulSoup
3import pandas as pd
4
5l=[]
6o={}
7
8
9
10head={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
11target_url="https://www.redfin.com/city/30749/NY/New-York/filter/status=active"
12
13
14
15resp = requests.get(target_url,headers=head,verify=False)
16print(resp.status_code)
17soup=BeautifulSoup(resp.text,'html.parser')
18
19allBoxes = soup.find_all("div",{"class":"HomeCardContainer"})
20
21for box in allBoxes:
22 try:
23 o["property-price"]=box.find("span",{"class":"homecardV2Price"}).text.strip()
24 except:
25 o["property-price"]=None
26
27 try:
28 o["property-config"]=box.find("div",{"class":"HomeStatsV2"}).text.strip()
29 except:
30 o["property-config"]=None
31
32 try:
33 o["property-address"]=box.find("div",{"class":"homeAddressV2"}).text.strip()
34 except:
35 o["property-address"]=None
36
37 try:
38 o["property-broker"]=box.find("div",{"class":"brokerageDisclaimerV2"}).text.strip()
39 except:
40 o["property-broker"]=None
41
42 try:
43 o["property-link"]="https://www.redfin.com"+box.find("a").get('href')
44 except:
45 o["property-link"]=None
46
47 l.append(o)
48 o={}
49
50
51print(l)
52df = pd.DataFrame(l)
53df.to_csv('properties.csv', index=False, encoding='utf-8')

Scraping Redfin Property Page

Scrape Redfin.com using Python (Save Data in CSV)

From the property page, we are going to gather this information.

  • Property Price

  • Property Address

  • Is it still available(True/False)

  • About section of the property

Download Raw HTML from the Page

1import requests
2from bs4 import BeautifulSoup
3
4
5
6l=[]
7o={}
8available=False
9
10
11head={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
12target_url="https://www.redfin.com/NY/New-York/112-E-35th-St-10016/home/45333496"
13
14
15
16resp = requests.get(target_url,headers=head,verify=False)
17print(resp.status_code)
Scrape Redfin.com using Python (Save Data in CSV)

This Python code performs web scraping on a Redfin property page using the requests library to make an HTTP GET request and the BeautifulSoup library to parse the HTML content. The script initializes empty data structures (l and o) to store scraped information and sets a User-Agent header to simulate a Chrome browser request. The target URL is specified, and an HTTP GET request is sent with SSL certificate verification disabled.

After running the code if you get 200 on your console then that means your code was able to scrape the raw HTML from the target web page.

Let’s use BS4 to parse the data.

Parsing the Raw HTML

As usual, we have to first find the location of each element inside the DOM.

Scrape Redfin.com using Python (Save Data in CSV)

Price is stored inside the div tag with class statsValue.

Scrape Redfin.com using Python (Save Data in CSV)

Property sale status is located inside the div tag with the class ListingStatusBannerSection.

Scrape Redfin.com using Python (Save Data in CSV)

About section of the property can be found inside the div tag with id marketing-remarks-scroll.

1soup=BeautifulSoup(resp.text,'html.parser')
2
3try:
4 o["property-price"]=soup.find("div",{"class":"statsValue"}).text.strip()
5except:
6 o["property-price"]=None
7
8try:
9 o["property-address"]=soup.find("h1",{"class":"full-address"}).text.strip()
10except:
11 o["property-address"]=None
12
13check = soup.find("div",{"class":"ListingStatusBannerSection"}).text.strip()
14if "ACTIVE" in check:
15 available=True
16else:
17 available=False
18
19try:
20 o["property-available"]=available
21except:
22 o["property-available"]=False
23
24try:
25 o["about-property"]=soup.find("div",{"id":"marketing-remarks-scroll"}).text.strip()
26except:
27 o["about-property"]=None
28
29print(l)

By default available is set to False and it is set to True if the string ACTIVE is present inside the check string. We have used strip() function to remove the unwanted spaces from the text value.

Once you run the code you should get this.

Scrape Redfin.com using Python (Save Data in CSV)

Finally, we were able to extract all the desired information from the target page.

Complete Code

The complete code for this property page will look like this.

1import requests
2from bs4 import BeautifulSoup
3
4
5l=[]
6o={}
7available=False
8
9
10head={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
11target_url="https://www.redfin.com/NY/New-York/112-E-35th-St-10016/home/45333496"
12
13
14
15resp = requests.get(target_url,headers=head,verify=False)
16print(resp.status_code)
17soup=BeautifulSoup(resp.text,'html.parser')
18
19try:
20 o["property-price"]=soup.find("div",{"class":"statsValue"}).text.strip()
21except:
22 o["property-price"]=None
23
24try:
25 o["property-address"]=soup.find("h1",{"class":"full-address"}).text.strip()
26except:
27 o["property-address"]=None
28
29check = soup.find("div",{"class":"ListingStatusBannerSection"}).text.strip()
30if "ACTIVE" in check:
31 available=True
32else:
33 available=False
34
35try:
36 o["property-available"]=available
37except:
38 o["property-available"]=False
39
40try:
41 o["about-property"]=soup.find("div",{"id":"marketing-remarks-scroll"}).text.strip()
42except:
43 o["about-property"]=None
44l.append(o)
45
46print(l)

Bonus Section

While scrolling down on the product page you will find information regarding agents, down payment, calculator, etc. This information loads through an AJAX injection.

Scrape Redfin.com using Python (Save Data in CSV)

This cannot be scraped through a normal XHR request. At this point, many of you will think that this information can be scraped easily through a headless browser but the problem is that these headless browsers consume too much CPU resources. Well, let me share the alternate for this.

Redfin renders this data from the API calls it makes from the second last script tag of any property page. Let me explain to you what I mean over here.

The raw HTML you get after making the GET request will have a script tag in which all this data will be stored.

Scrape Redfin.com using Python (Save Data in CSV)

The script tag you see above is the second last script tag of the raw HTML downloaded from the target property page. Here is how using regular expression you can access the data from this tag.

1try:
2 o["other-details"]=soup.find_all('script')[-2]
3except:
4 o["other-details"]=None
5
6config_match = re.search(r'reactServerState\.InitialContext\s*=\s*({.*?});', str(o["other-details"]))
7
8if config_match:
9 config_data = config_match.group(1)
10 print(config_data)

Using regular expression we are finding a string that matches the pattern reactServerState.InitialContexts*=s*({.*?});

Once you run this code you will find all the information inside this string.

Scrape Redfin.com using Python (Save Data in CSV)

How to scrape Redfin at scale?

The above approach is fine until you are scraping a few hundred pages but this approach will fall flat when your scraping demands reach millions. Redfin will start throwing captcha pages like this.

Scrape Redfin.com using Python (Save Data in CSV)

To avoid this situation you have to use a web scraping API like Scrapingdog. This API will handle proxy rotations for you. Proxy rotation will help you maintain the data pipeline.

You can sign up for the free account from here. The free account comes with a generous 1000 credits which is enough for testing the API.

scrapingdog homepage

Once you are on the dashboard you will find an API key that will be used in the below code.

1import requests
2from bs4 import BeautifulSoup
3import pandas as pd
4
5l=[]
6o={}
7
8
9target_url="https://api.scrapingdog.com/scrape?dynamic=false&api_key=YOUR-API-KEY&url=https://www.redfin.com/city/30749/NY/New-York/filter/status=active"
10
11resp = requests.get(target_url)
12print(resp.status_code)
13soup=BeautifulSoup(resp.text,'html.parser')
14
15allBoxes = soup.find_all("div",{"class":"HomeCardContainer"})
16
17for box in allBoxes:
18 try:
19 o["property-price"]=box.find("span",{"class":"homecardV2Price"}).text.strip()
20 except:
21 o["property-price"]=None
22
23 try:
24 o["property-config"]=box.find("div",{"class":"HomeStatsV2"}).text.strip()
25 except:
26 o["property-config"]=None
27
28 try:
29 o["property-address"]=box.find("div",{"class":"homeAddressV2"}).text.strip()
30 except:
31 o["property-address"]=None
32
33 try:
34 o["property-broker"]=box.find("div",{"class":"brokerageDisclaimerV2"}).text.strip()
35 except:
36 o["property-broker"]=None
37
38 try:
39 o["property-link"]="https://www.redfin.com"+box.find("a").get('href')
40 except:
41 o["property-link"]=None
42
43 l.append(o)
44 o={}
45
46print(l)
47df = pd.DataFrame(l)
48df.to_csv('properties.csv', index=False, encoding='utf-8')

Did you notice something? The code is almost the same as above we just replaced the target URL with the Scrapingdog API URL. Of course, you have to use your personal API key above to run this program successfully.

It is a very economical solution for large-scale scraping. You just have to focus on data collection and the rest will be managed by Scrapingdog.

Bonus Section

In this blog, I have scraped two distinct types of pages on Redfin: the search page and the property page. Moreover, I have included a bonus section that sheds light on extracting information that’s dynamically loaded through AJAX injections.

Just like Redfin, I have extracted data from other real estate giants. (find their links below)

    1. Scraping Zillow Real Estate Property Data using Python

    2. Scraping Idealista.com using Python

    3. Web Scraping Realtor Property Data using Python

    4. Web Scraping Airbnb Data using Python

If this article resonates with you and you appreciate the effort put into this research, please share it with someone who might be on the lookout for scalable real estate data extraction solutions from property sites.

In the future, I will be making more such articles. If you found this article helpful, please share it. Thanks for reading!

Conclusion

In this blog, I have scraped two distinct types of pages on Redfin: the search page and the property page. Moreover, I have included a bonus section that sheds light on extracting information that’s dynamically loaded through AJAX injections.

Just like Redfin, I have extracted data from other real estate giants. (find their links below)

  1. Scraping Zillow Real Estate Property Data using Python

  2. How to Scrape Zoopla Property Data using Python

  3. How To Scrape Homegate.ch Data using Python

  4. Scraping Idealista.com using Python

  5. Web Scraping Realtor Property Data using Python

  6. Web Scraping Airbnb Data using Python

If this article resonates with you and you appreciate the effort put into this research, please share it with someone who might be on the lookout for scalable real estate data extraction solutions from property sites.

In the future, I will be making more such articles. If you found this article helpful, please share it. Thanks for reading!

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked Try for Free Contact sales

Try Scrapingdog for Free!

Get 200 free credits to spin the API. No credit card required!