Citi Bike Analysis using Python
Leading a team of analysts for a New York City bike-sharing service, I am tasked with evaluating user behavior to optimize bike distribution and identify expansion opportunities. Using Citi Bike's public data, the project aims to uncover insights that support strategic decisions and address bike availability challenges, ensuring continued leadership in eco-friendly transportation.
-
Goals
Since its 2013 launch, Citi Bike’s popularity has surged, with demand further increasing during the COVID-19 pandemic. This growth has led to distribution issues, such as bike shortages at popular stations and overcrowded docking points.
As lead analyst, I will diagnose the root causes—whether volume, seasonal trends, or other factors—and recommend solutions. Findings will be communicated to management through an interactive dashboard with key metrics to resolve distribution challenges.
-
Skills & Tools
Pandas
Numpy
Seaborn
Matplotlib
kepler.gl
Streamlit
-
Key Focus Areas
Identifying High-Traffic Stations: Analyze patterns to determine which bike stations experience the highest usage across the city.
Impact of Weather on Bike Usage: Investigate how different weather conditions influence the frequency of bike rentals and rider behavior.
Popular Trip Routes: Explore the most common routes taken between bike stations to understand popular travel patterns.
Comparing Bike Type Preferences: Assess the differences in usage between classic and electric bikes to understand rider preferences and behaviors.
-
Data Limitations
Limited Scope of Data Sources: The analysis relies on Citi Bike and NOAA data, potentially missing other relevant factors, such as public transportation schedules, road closures, or construction projects, which could influence bike usage.
Real-Time Data Unavailability: If real-time data is not available, it could limit the ability to implement dynamic rebalancing strategies effectively.
-
Challenges
Identifying Expansion Opportunities: Determining where to add new stations will involve assessing multiple factors, such as current demand, population growth, tourist activity, and accessibility, making it a complex multi-criteria decision problem.
User Behavior Diversity: Different rider segments (e.g., tourists vs. commuters, classic vs. electric bike users) may have distinct usage patterns, complicating the analysis of overall trends and recommendations.
-
Recommendations
Seasonal Adjustment: Reduce the number of bikes by 30-40% from November to April due to the significant drop in ridership during colder months. This adjustment will optimize operational costs while maintaining service quality in areas with moderate demand.
Expansion Strategy: Use demand forecasting—based on current usage patterns, population density, and tourist activity—to determine the number of new stations needed. Begin with a 10-15% increase in stations in underserved waterfront areas.
Optimizing Bike Availability: Implement real-time rebalancing and dynamic stocking schedules to ensure bikes are available at high-demand stations. Consider incentives for riders who return bikes to underutilized stations and increase docking capacity at peak times to prevent shortages.