As networking, data storage, and data collection capacities rapidly evolve, the scope of Big Data is expanding across all science and engineering disciplines. Data mining techniques have the potential to provide valuable insights for optimizing industrial frameworks and enhancing production efficiency.
The reception of Automatic Identification System (AIS) messages via satellites is becoming more prevalent, and the deployment of AIS terminals at shore-based network centers enables the collection of extensive AIS data on a large scale.
Unreliability of AIS data
Mature information technology and hardware can provide unprecedented computing capability for processing data and consequently, many maritime researchers are investigating AIS data mining. An issue that these researchers have to deal with is cleaning raw AIS data, which is not easy, because of many erroneous AIS messages. For example, results from research into reliability and completeness of AIS information in the Vessel Traffic Service (VTS) area of the Dover Strait showed that more than 50% of ship destination information was incorrect, and 5% of messages contained a false Maritime Mobile Service Identity (MMSI) and course. In another research for AIS quality, based on a few months of data from Liverpool VTS and the AISLive company, numerous problems have been identified. For instance, more than one station broadcasting the same MMSI number creates discrepancies. Details show that information such as MMSI number, vessel type, position, are not reliable. Moreover, an analysis focusing on Heading (HDG) and Rate Of Turn (ROT) parameters shows that the AIS system is prone to receiving incomplete data broadcast by vessels’ transmitters.
AIS data cleaning
The quality of AIS data is a subject of interest for many researchers. However, published research on preprocessing raw data to improve quality is limited. One special research in 2015 took the development of a vessel database as the key to managing AIS data and for quality control. All fields were checked for obvious outliers. If it was not possible to correct an outlier, it was removed. In-line with this strategy, the common method to filter inaccurate single position points is the gating of position, velocity and course. To solve the problem of sharing MMSI numbers, a method of elimination is usually applied. Other researchers proposed a nearest neighbour approach to assign AIS messages to the right tracks, but there is no detailed experimental method, performance or results. In another research, a simple algorithm is created to calculate the likelihood of an association between an AIS message and each candidate vessel. It is used for processing massive data at a global scale but it cannot be applied in a small region where AIS messages are sampled at a high rate. It is due to the fact that the algorithm is unable to handle an association in the case where there are at least three consecutive abnormal track points. A similar method with velocity gating was also proposed in recent years. Given that none of these techniques is universally applicable, it is necessary to propose a method with general applicability. The accuracy of time is the key to kinematic gating but there are few studies on this. Most researchers remove data with the wrong MMSI and only focus on obvious mistakes. They seldom take account of the problem of temporal and spatial attributes in the dataset, such as time delay, boundary problems and other influencing factors.
Quality dimensions of the AIS data
There are many errors in the raw AIS track dataset, which makes the identification of quality dimensions crucial. AIS quality dimensions can be categorized as follows based on recent research on this matter.
1. Physical integrity
Physical integrity is a measure of the degree of validity of an individual track and AIS message, which includes reliability of the AIS message and completeness of the track.
1.1 Reliability of AIS message
Reliability means the general coherence of the AIS message with respect to International Telecommunications Union (ITU) recommendations. Experiments mainly focus on fields including MMSI, SOG, Longitude, Latitude, Second, Minute, Hour, Length of ship and Type of ship. For example, in a recently published research, 4.25% of total AIS type 1 messages that have been collected from AIS base stations had latitude or longitude values larger than the maximum allowable values of 90 degrees and 180 degrees. Errors in other fields have been discovered as well, and the proportion of messages that contain these errors and ships with unreliable additional information in the research area are significant.
1.2 Completeness of track
The completeness of track is defined by two standards. If the number of track points in a track is too small, or there is no corresponding static and voyage related information report, the track will be considered as a track that lacks completeness, because it cannot represent track features of interest.
2. Spatial logical integrity
Spatial logical integrity means the extent of the correctness of time-space relationship between the messages in a sequential set of trajectory points, which includes accuracy, consistency, and relevance.
2.1. Accuracy of track
In the case of an AIS trajectory, one can take the number of logical track points in a trajectory as the measurement of the degree of accuracy. If all the track points are logical, the accuracy is maximal. However, in general, logical points form the majority of the whole track. Consequently, the accuracy of the track can be determined based on whether illogical points are present or not. It has been discovered that there are random errors of position in a proportion of AIS tracks; their values of latitude or longitude changed illogically, which creates illogical positions in the chart of the track. In addition, there are also consecutive outliers in the ship movement that can be recognised by kinematic gating. The results show that 12.36% of ships generate such errors.
2.2. Consistency of track
In an AIS trajectory, the source of the different track points must agree. The MMSI number is a unique number given to every vessel for identification. It is the sole means of discrimination between AIS ship stations, which is usually used to extract ships’ trajectories. However, because of improper use, deliberate or accidental, it is observed that the same MMSI number can be used by different ships. If those data are collected in the same observation period, the ship may jump between multiple positions on the chart. Trajectories of ships with the same MMSI are identified frequently in various scenarios.
2.3. Relevance of track
Relevance can measure the relational degree between two data objects. In the case of AIS trajectory, the problem of the time relationship between track points in a track is inevitable. Not all of the track data are completely preserved in the research area. For example, ships may cross the borderline repeatedly in the period, and this results in a loss of data. The tracks that lack relevance can be discovered in majority of AIS sets, which includes the illogical tracks crossing the land and along the borderline.
3. Accuracy of time
AIS messages collected from AIS receiving stations are usually marked with an external time stamp, which is called recorded time. In the process of generating an AIS message, a communication time stamp is coded into the sentence, which is called generated time. It is observed that the time data of AIS messages have errors. In a recent experiment, AIS messages of the same ship received by different stations are compared, and it has been found that these external time stamps can be inconsistent due to clock offsets and instabilities. Results show that the quality of time information is related to the receiving station; the time of generation cannot be recorded precisely.
Conclusion
The quality of the dataset is not only a key to the comprehensiveness of analysis but also the essential factor in avoiding misleading results. The lack of a systematic pre-processing method limits studies using AIS data mining. Due to these issues, there is a growing potential subject in the maritime industry regarding the development of high-performance methods for preprocessing the AIS data in the community.
References
-
Ship Trajectories Pre-processing Based on AIS Data
Liangbin Zhao, Guoyou Shi, Jiaxuan Yang
Journal of Navigation (2018)
-
Training, technology and ais: looking beyond the box
Nicholas Bailey
Cardiff University (2005)
-
Automatic Identification System (AIS): data reliability and human error implications
Abbas Harati-Mokhtari, Alan Wall , Philip Brooks, Jin Wang
The Journal of Navigation (2007)
-
Assessment of AIS vessel position report under the aspect of data reliability
P. Banys, T. Noack, S. Gewies
Annual of Navigation (2012)
-
Discovering vessel activities at sea using AIS data: Mapping of fishing footprints
Fabio Mazzarella, Michele Vespe, Dimitrios Damalas, Giacomo Osio
17th International conference on information fusion (2014)