1 Project Overview

This is the link for the final paper. Click here.

1.1 Motivation For The Project

I want to look at National Park data for my research project. The National Parks are an important part of the United States and function as areas of land that are preserved by the national government. They can have historical elements, or be a place to enjoy the wonderful views they have to offer. I have been to several National Parks myself. I would like to look at the data to :

  • Find which parks have the most visitors each year and if visitation in the parks is increasing or decreasing.

  • Identify what could have caused these spikes and peaks in visitation.

  • Explore which regions of the United States have the most parks.

  • Determine if certain parks contribute more to the economy.

  • Look at the impacts COVID-19 had on visitation.

1.2 Background Knowledge

The National Park Service is primarily funded by Congress, but also is funded through park entrance fees and some private philanthropies. For years the National Park Service has lacked the funding it needs to maintain and protect these national parks. Since the first National Park, Yellowstone, was established in 1872, there have been an additional 400 National Parks with over 20,000 employees added. This research is important, because it can help determine how much funding the National Parks need and which ones need more attention than others. It can help us discover how different issues have affected these National Parks. It can also help us see if certain parks should be closed during certain times of the year in order to have a more efficient and effective budget for these parks. We know that national parks provide peaceful places to enjoy scenery and give wildlife and native plants a safe home which maintains our ecosystem. Economically, national parks create jobs in tourism, park management and capital works and draw visitors to regional areas where they spend their money in local towns.

1.3 Summary of the Modeling Result

In my model I was able to figure out how visitation, park size, and park type correlated to the amount of visitor spending and economic contribution of the parks. If I had more time, I would’ve liked to gather data on climate change, stocks, and financial patterns to see how these factors could have affected visitation at parks. I was able to see that visitation decreased in 2020 due to the COVID-19 pandemic.

2 Data Summary

I was able to collect 30 years of data from the National Park Service. I collected Public Use Data data from 1979 to 2021. I also collected data on the amount of trails located in each park from Kaggle, All Trails Data. Additionally I was able to find Visitor Spending Data for 2020.

2.1 All Trails Data

  • This data is from the All Trails website, which is a platform containing all hiking trails located in each National Park. It includes the coordinates of each trail, park name, state name, and city name. Additionally this data provides variables like the average rating, length, elevation gain, and difficulty rating for each trail.

2.1.1 Summary Statistics: All Trails

#Read csv file 
alltrails <- read_csv("/Users/gabriellescibelli/Modified2AllTrails.csv")

#first 6 rows 
head(alltrails)
rmarkdown::paged_table(alltrails)
#Number of Trails per state
TrailsperPark <-alltrails%>%
  group_by(state_name)%>%
  summarize(trails = n_distinct(name))
TrailsperPark%>%
  arrange(desc(trails))
nvars <- format(round(ncol(alltrails), 0), nsmall = 0, big.mark = ",")
nobs <- format(round(nrow(alltrails), 0), nsmall = 0, big.mark = ",")

The number of variables is 21; the number of observations 3,313.

2.2 National Park Service Data

2.2.1 Public Use Statistics (1979-2021)

  • Public Use Statistics give the number of visits per month and year for each National Park. This data includes Recreation and Non-Recreation Visits, as well as the hours the park was open for each group and the number of visitors who camped, stayed overnight, etc.
Hmisc::describe(public_use) %>% Hmisc::html()
public_use

19 Variables   173563 Observations

index
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
17356306465712451318564 2194 44161094821919360724779255980
lowest : 2 3 4 5 6 , highest: 64654 64655 64656 64657 64658
Park
nmissingdistinct
1735630372
lowest :Abraham Lincoln Birthplace NHPAcadia NP Adams NHP African Burial Ground NM Agate Fossil Beds NM
highest:Wupatki NM Yellowstone NP Yosemite NP Yukon-Charley Rivers NPRES Zion NP

Unit Code
nmissingdistinct
1735630372
lowest : ABLI ACAD ADAM AFBG AGFO , highest: WWII YELL YOSE YUCH ZION
Park Type
image
nmissingdistinct
173563019
lowest :International Historic Site National Battlefield National Battlefield Park National Historic Site National Historical Park
highest:National River National Seashore National Wild & Scenic RiverPark (Other) Park Type

Region
image
nmissingdistinct
17356308
lowest :Alaska Intermountain Midwest National Capital Northeast
highest:National Capital Northeast Pacific West Region Southeast
 Value                Alaska    Intermountain          Midwest National Capital
 Frequency              7032            39458            23678            15105
 Proportion            0.041            0.227            0.136            0.087
                                                                               
 Value             Northeast     Pacific West           Region        Southeast
 Frequency             33817            25472                3            28998
 Proportion            0.195            0.147            0.000            0.167
 

State
image
nmissingdistinct
173563055
lowest : AK AL AR AS AZ , highest: VT WA WI WV WY
Year
image
nmissingdistinct
173563044
lowest : 1979 1980 1981 1982 1983 , highest: 2018 2019 2020 2021 Year
Month
image
nmissingdistinct
173563013
lowest : 1 10 11 12 2 , highest: 6 7 8 9 Month
 Value          1    10    11    12     2     3     4     5     6     7     8     9
 Frequency  14458 14467 14468 14467 14458 14458 14460 14462 14463 14466 14466 14467
 Proportion 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
                 
 Value      Month
 Frequency      3
 Proportion 0.000
 

Recreation Visits
nmissingdistinct
173563077961
lowest :-20331 -320 -39 -425 0
highest:99986 9999 99996 99998 Recreation Visits

Non-Recreation Visits
nmissingdistinct
173563024528
lowest :-112 -16 -2231 -32964 -788
highest:99989 9999 99990 99991 Non-Recreation Visits

Recreation Hours
nmissingdistinct
173563096363
lowest :-10068 -12 -18 -192 -30
highest:999924 99993 999971 99998 Recreation Hours

Non-Recreation Hours
nmissingdistinct
173563024528
lowest :-112 -16 -2231 -32964 -788
highest:99989 9999 99990 99991 Non-Recreation Hours

Concessioner Lodging
nmissingdistinct
17356308336
lowest :0 1 10 100 1000
highest:99853 9986 999 9993 Concessioner Lodging

Concessioner Camping
nmissingdistinct
17356304019
lowest :0 1 10 100 1000
highest:997 998 9983 999 Concessioner Camping

Tent Campers
nmissingdistinct
173563010240
lowest :0 1 10 100 1000
highest:9992 9993 9994 9996 Tent Campers

RV Campers
nmissingdistinct
17356309728
lowest :0 1 10 100 1000
highest:9995 9996 9997 9998 RV Campers

Backcountry Campers
nmissingdistinct
17356307570
lowest :0 1 10 100 1000
highest:9990 9994 9996 9999 Backcountry Campers

Non-Recreation Overnight Stays
nmissingdistinct
17356302071
lowest :0 1 10 100 1000
highest:995 996 9984 999 Non-Recreation Overnight Stays

Misc. Overnight Stays
nmissingdistinct
17356306798
lowest :0 1 10 100 1000
highest:998 99884 999 9995 Misc. Overnight Stays

#Total Visits per Year in each Region 
public_use%>%
  group_by(Year, Region)%>%
  summarize(Total_Visits = n_distinct(Rec_Visits))

2.3 Economic Contribution of National Park Visitor Spending

2.3.1 Visitor Spending Data

  • Gives us information on the number of visitors in each park and how much they spend in each National Park, the number of jobs provided by parks, and home much income workers make.
Visitor_Spending_2020 <- read_csv("~/Downloads/National Park Project/Visitor Spending State 2020.csv")
head(Visitor_Spending_2020)
skim(Visitor_Spending_2020)
Data summary
Name Visitor_Spending_2020
Number of rows 54
Number of columns 6
_______________________
Column type frequency:
character 1
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
State 0 1 2 14 0 54 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Visitation 0 1 4390080.22 5709542.29 4819.0 445463.50 2300616.50 6078418.00 28645839.0 ▇▃▁▁▁
Visitor Spending 0 1 269140740.74 358821715.61 300000.0 28650000.00 145350000.00 324200000.00 1716500000.0 ▇▁▁▁▁
Total Output 0 1 381187037.04 535369782.64 400000.0 39325000.00 185050000.00 474450000.00 2693300000.0 ▇▁▁▁▁
spending 0 1 269.14 358.82 0.3 28.65 145.35 324.20 1716.5 ▇▁▁▁▁
output 0 1 381.19 535.37 0.4 39.33 185.05 474.45 2693.3 ▇▁▁▁▁

3 Research Questions

  • Is visitation increasing or decreasing at certain parks?
    • How many parks are in the United States? Each Region? Each State?
      • How many trails are in each park?
    • Are there seasonal trends?
    • Does the location of the park correlate to the amount of visitation?
    • Does the size and amount of parks correlate with the economic contribution of the park?

4 Hypothesis

I believe that parks located in the West will get more visitation and result in more economic contribution than parks located in the East.

  • The West has more National Parks because at the time when the National Park System was established, most of the land in the East was privately owned.

    • The West is less urbanized and has more open land.
    • I think California will have the most parks.

5 Visualizations

5.1 How many Parks are in the United States?

total_park_count <-public_use%>%
  group_by(Year)%>%
  summarize(p = n_distinct(Park))
uspark <- ggplot(data = total_park_count, 
       mapping = aes(x = Year, y = p))+ 
  geom_line(color="#69b3a2", size=2)+ 
  ylim(100, 400)+
  ggtitle("Number of National Parks in the United States \n(1979 - 2021)")+
  xlab("Year")+
  ylab("Total Number of Parks")
uspark

The number of Parks in the United States has increased from 268 in 1979 to 368 in 2021.

5.2 Number of Parks Per Region

numparks_region<-public_use%>%
  group_by(Region)%>%
  summarize(Parks = n_distinct(Park))

numparks_region%>%arrange(desc(Parks))

The number of parks found in each region.

To see a map of the Regions click here.

5.3 Visits By Region

regionvis <- public_use%>%
  group_by(Region, Year)%>%
  summarize(visits = sum(Rec_Visits))
regionvis
ggplot(data = regionvis, 
       mapping = aes(x = Year, y =visits, fill = Region))+
  geom_col()

faceted_region <- ggplot(data = public_use, 
                         mapping = aes(x = Year, y = Rec_Visits))+
  geom_point(alpha = 0.2)+
  facet_wrap(~Region, ncol = 3)
faceted_region+
  theme_bw()+ 
  ggtitle("Recreation Visits By Region")+ 
  ylab("Recreation Visits")+ 
  xlab("")+
  scale_y_continuous(labels = comma)

5.4 Number of Parks per State and Region

park_state <- public_use%>%
  group_by(State, Region)%>%
  summarize(park = n_distinct(Park))
park_state%>% arrange(desc(park))%>%ggplot(data = park_state, 
       mapping = aes(x = State, y = park, color = Region))+ 
  geom_point()+ 
  theme(axis.text.x = element_text(angle = 90))+ 
  labs(x = "State", y = "Park Count", 
       title = "Amount of Parks by State and Region")

5.5 Number of Trails Per Park

TrailsperPark <-alltrails%>%
  group_by(state_name)%>%
  summarize(trails = n_distinct(name))%>%
  arrange(desc(trails))
c <- ggplot(TrailsperPark, aes(x = state_name, 
                               y = trails))+
  geom_segment(aes(x = state_name, xend = state_name, y = 0, yend = trails),
               color = "skyblue") +
  geom_point(color = "blue", size = 4, alpha = 0.6)+
  theme_light()+
  coord_flip()+
  theme(panel.border = element_blank(),
        axis.ticks.y = element_blank())+
  ggtitle("Number of Trails in each State")+
  xlab("")+
  ylab("Number of Trails")
c

5.6 Visitation in 2021

lastyear <- public_use%>%
  filter(Year == 2021)%>%
  select(Year, Month, Park, Region, Rec_Visits)
ggplot(data = lastyear, 
       mapping = aes(x = Month, y = Rec_Visits, fill = Region))+
  geom_bar(position = "stack", stat = "identity")+
  scale_x_continuous(breaks = breaks_width(1))+
  theme_classic()+
  scale_y_continuous(labels = label_number(suffix = "K", scale = 1e-6))+
  ylab("Recreation Visit (Thousands) ")+ 
  xlab("Month")+
  ggtitle("Visitation By Month in 2021")

5.7 Visitor Spending Data for 2020

  • I made an interactive plot using plotly.
Visitor_Spending_2020 <- read_csv("~/Downloads/National Park Project/Visitor Spending State 2020.csv")
bubble <- Visitor_Spending_2020%>%
  arrange(desc(Visitation))%>%
  mutate(State = factor(State, State))%>%
  mutate(text = paste("State: ", State, "\nVisitation: ", Visitation,
                      "\n Spending (M):", spending, "\n Output (M):", output, sep = ""))%>%
  ggplot(aes(x = spending, y = output, size = Visitation, color = State, text = text))+ 
  geom_point(alpha = 0.7) + 
  scale_size(name = "Visitation ")+ 
  scale_color_viridis(discrete = TRUE, guide = FALSE)+
  theme_ipsum()+
  theme(legend.position = "none")
bubble2 <- ggplotly(bubble, tooltip = "text")  
bubble2

5.8 Conclusion

The main findings were that California has the most amount of parks and trails, though it is not located in the region that has the most parks. California also has the most amount of visitors per year, so this correlates to why they have the most amount of visitor spending and the highest economic contribution. We can conclude that California does not have to rely heavily on government funding. States with the most parks tend to have higher rates of visitation, and therefore the most visitor spending and economic contribution. I would like to look further into why Washington DC has less economic contribution and visitor spending. My thoughts are that a lot of visits are school bus trips, thus the students are not spending as much as a family would on their visits to the parks. If parks are able to increase their visitation, then they would be able to contribute more to the economy and generate more revenue that could in turn increase the funding that goes toward maintenance of these parks. We saw that the COVID-19 pandemic affect the parks, by decreasing visitation in all regions. I would like to look further to see if any specific parks increased in visitation during these times.

I had many limitations while conducting my research, such as not gathering all the data I wanted and having trouble scrapping the specific data. Gathering and cleaning the data took a lot longer than anticipated and therefore set me back in my research and development of economic analysis. I hope to continue this research and look further into variables that made me question the outcomes. I would like to see how visitation changed for each park over the years, and in each month to see if certain parks and park types have become more or less popular over the years. I would also like to look into the climate changes, and travel patterns to see if these factors have to do with visitation rates. I would look further into the average rating of parks and traffic counts in parks. If I can gather all the variables I need, my analysis would be more thorough and be able to make better suggestions to how the parks can change their current mannerisms to make the National Park System better. For now, we can continue to visit these parks and contribute to the maintenance needed. We can do our part by cleaning up parks while we enjoy hikes, to conserve the beauty displayed at these parks.