aiforsea computer

Computer Vision

How might we automate the process of recognizing the details of the vehicles from images, including make and model?

This is a data science assignment where you are expected to create a data model from a given training dataset.

PROBLEM STATEMENT

SUBMISSION DEADLINE

Economies in Southeast Asia are turning to AI to solve traffic congestion, which hinders mobility and economic growth. The first step in the push towards alleviating traffic congestion is to understand travel demand and travel patterns within the city.

Can we accurately forecast travel demand based on historical Grab bookings to predict areas and times with high travel demand?

Please submit the final repository including documentation by or before 17 June 2019, 6.00pm (SGT).

In this challenge, participants are to build a model trained on a historical demand dataset, that can forecast demand on a Hold-out test dataset. The model should be able to accurately forecast ahead by T+1 to T+5 time intervals (where each interval is 15-min) given all data up to time T.

You can use the “Demand Data” dataset provided by Grab.

You are expected to create a Data Model based on the “Demand Data” dataset in order to solve the given problem statement.

You should also provide step by step documentation on how to run your code. Our evaluators will be running your data models on a test dataset.

The given dataset contains normalised historical demand of a city, aggregated spatiotemporally within geohashes and over 15 minute intervals. The dataset spans over a two month period. A brief description of the dataset fields are found below:

Field

Description

geohash6

geohash level 6
Geohash is a public domain geocoding system which encodes a geographic location into a short string of letters and digits with arbitrary precision. You are free to use any geohash library to encode/decode the geohashes into latitude and longitude or vice versa. Some examples include https://github.com/hkwi/python-geohash (for Python), https://github.com/kungfoo/geohash-java (for Java).

day

start time of 15-minute intervals, in the following format: <hour>:<minute>, where hour ranges from 0 to 23 and minute is either one of (0, 15, 30, 45)

timestamp

day, where the value indicates the sequential order and not a particular day of the month

demand

aggregated demand normalised to be in the range [0,1]

You will be judged on the following criteria:

Code Quality

Creativity in Problem-solving

Code Quality, also known as Software Quality, is generally defined in two ways:

How well does the code conform to the functional specifications and requirements
of a project.
Structural quality, which relates to the maintainability and robustness of the code.

Creativity speaks volumes about your capability to make sense of given data, derive tangible results relevant to the business needs of an organization and present the findings. All this, while keeping in mind the problem statements.

Check out our thought process behind these challenges in our short film!

Feature Engineering

Model Performance

Feature Engineering, also referred to as pre-processing, refers to the process of selecting and transforming variables when creating a data model for a given problem statement. While you will be given a general dataset which relates to the problem statement, you need to create “features” that make the models and algorithms work as intended.

Note that your code should be able to automatically create your desired features, that can be used in the evaluation of the Hold-out test set.

Model performance determines how a model represents the data and how well the chosen model will work. In this challenge, we will be performing a Hold-out model evaluation. For this problem, you are given a training dataset, and our evaluators will have a test dataset (not seen by the model). This test dataset will assess the likely future performance of the model.

Test dataset details:

1. Timeframe: The test dataset can start from any time period after the timeframe of the training dataset. Your model can use features of up to 14 consecutive days from the test dataset, ending at timestamp T and predict T+1 to T+5.

2. Geohash coverage: You may assume that the set of geohashes are the same in training dataset and test dataset. The original geohashes are anonymised, but you may assume that adjacency is maintained between the geohashes.

Submissions will be evaluated by RMSE (root mean squared error) averaged over all geohash6, 15-minute-bucket pairs.

QUALIFICATION CRITERIA

SUBMISSION GUIDELINES

Submit the correct link to your repository
Make sure your repository includes the complete codebase (all the commits are done, documentation, complete, etc)
Solve only one of the challenges mentioned on the website
Do not plagiarise the code. That will be grounds for instant disqualification
The link to your repository must be publicly accessibly from the time of submission.

You can submit the code (either as a codebase or a Jupyter notebook) by uploading it to a public Github or similar repository. The instructions to submit the repository link will be sent to you via email once you accept the challenge on https://www.aiforsea.com/

See our Terms of Participation