Data utilities

Data utilities — Functions for coalescing, merging, date handling and normalizing

Stability Level

Stable, unless otherwise indicated

Synopsis

#include <libtracker-extract/tracker-extract.h>

gchar*              tracker_coalesce                    (gint n_values,
                                                         ...);
const gchar*        tracker_coalesce_strip              (gint n_values,
                                                         ...);
gchar*              tracker_merge                       (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);
gchar*              tracker_merge_const                 (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);
gssize              tracker_getline                     (gchar **lineptr,
                                                         gsize *n,
                                                         FILE *stream);
gchar*              tracker_text_normalize              (const gchar *text,
                                                         guint max_words,
                                                         guint *n_words);
gboolean            tracker_text_validate_utf8          (const gchar *text,
                                                         gssize text_len,
                                                         GString **str,
                                                         gsize *valid_len);
gchar*              tracker_date_format_to_iso8601      (const gchar *date_string,
                                                         const gchar *format);
gchar*              tracker_date_guess                  (const gchar *date_string);

Description

This API is provided to facilitate common more general functions which extractors may find useful. These functions are also used by the in-house extractors quite frequently.

Details

tracker_coalesce ()

gchar*              tracker_coalesce                    (gint n_values,
                                                         ...);

Warning

tracker_coalesce has been deprecated since version 1.0 and should not be used in newly-written code. Use tracker_coalesce_strip() instead.

This function iterates through a series of string pointers passed using Varargs and returns the first which is not NULL, not empty (i.e. "") and not comprised of one or more spaces (i.e. " ").

The returned value is stripped using g_strstrip(). All other values supplied are freed. It is MOST important NOT to pass constant string pointers to this function!

n_values :

the number of Varargs supplied

... :

the string pointers to coalesce

Returns :

the first string pointer from those provided which matches, otherwise NULL.

Since 0.8


tracker_coalesce_strip ()

const gchar*        tracker_coalesce_strip              (gint n_values,
                                                         ...);

This function iterates through a series of string pointers passed using Varargs and returns the first which is not NULL, not empty (i.e. "") and not comprised of one or more spaces (i.e. " ").

The returned value is stripped using g_strstrip(). It is MOST important NOT to pass constant string pointers to this function!

n_values :

the number of Varargs supplied

... :

the string pointers to coalesce

Returns :

the first string pointer from those provided which matches, otherwise NULL.

Since 0.9


tracker_merge ()

gchar*              tracker_merge                       (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);

Warning

tracker_merge has been deprecated since version 1.0 and should not be used in newly-written code. Use tracker_merge_const() instead.

This function iterates through a series of string pointers passed using Varargs and returns a newly allocated string of the merged strings. All passed strings are freed (don't pass const values)/

The delimiter can be NULL. If specified, it will be used in between each merged string in the result.

delimiter :

the delimiter to use when merging

n_values :

the number of Varargs supplied

... :

the string pointers to merge

Returns :

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_merge_const ()

gchar*              tracker_merge_const                 (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);

This function iterates through a series of string pointers passed using Varargs and returns a newly allocated string of the merged strings.

The delimiter can be NULL. If specified, it will be used in between each merged string in the result.

delimiter :

the delimiter to use when merging

n_values :

the number of Varargs supplied

... :

the string pointers to merge

Returns :

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.9


tracker_getline ()

gssize              tracker_getline                     (gchar **lineptr,
                                                         gsize *n,
                                                         FILE *stream);

Reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.

Read GNU getline()'s manpage for more information

lineptr :

Buffer to write into

n :

Max bytes of linebuf

stream :

Filestream to read from

Returns :

the number of characters read, including the delimiter character, but not including the terminating NULL byte. This value can be used to handle embedded NULL bytes in the line read. Upon failure, -1 is returned.

Since 0.9


tracker_text_normalize ()

gchar*              tracker_text_normalize              (const gchar *text,
                                                         guint max_words,
                                                         guint *n_words);

Warning

tracker_text_normalize has been deprecated since version 1.0 and should not be used in newly-written code. Use tracker_text_validate_utf8() instead.

This function iterates through text checking for UTF-8 validity using g_utf8_get_char_validated(). For each character found, the GUnicodeType is checked to make sure it is one fo the following values:

All other symbols, punctuation, marks, numbers and separators are stripped. A regular space (i.e. " ") is used to separate the words in the returned string.

The n_words can be NULL. If specified, it will be populated with the number of words that were normalized in the result.

text :

the text to normalize

max_words :

the maximum words of text to normalize

n_words :

the number of words actually normalized

Returns :

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_text_validate_utf8 ()

gboolean            tracker_text_validate_utf8          (const gchar *text,
                                                         gssize text_len,
                                                         GString **str,
                                                         gsize *valid_len);

This function iterates through text checking for UTF-8 validity using g_utf8_validate(), appends the first chunk of valid characters to str, and gives the number of valid UTF-8 bytes in valid_len.

text :

the text to validate

text_len :

length of text, or -1 if NUL-terminated

str :

the string where to place the validated UTF-8 characters, or NULL if not needed.

valid_len :

Output number of valid UTF-8 bytes found, or NULL if not needed

Returns :

TRUE if some bytes were found to be valid, FALSE otherwise.

Since 0.9


tracker_date_format_to_iso8601 ()

gchar*              tracker_date_format_to_iso8601      (const gchar *date_string,
                                                         const gchar *format);

This function uses strptime() to create a time tm structure using date_string and format.

date_string :

the date in a string pointer

format :

the format of the date_string

Returns :

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_date_guess ()

gchar*              tracker_date_guess                  (const gchar *date_string);

This function uses a number of methods to try and guess the date held in date_string. The date_string must be at least 5 characters in length or longer for any guessing to be attempted. Some of the string formats guessed include:

  • "YYYY-MM-DD" (Simple format)

  • "20050315113224-08'00'" (PDF format)

  • "20050216111533Z" (PDF format)

  • "Mon Feb 9 10:10:00 2004" (Microsoft Office format)

  • "2005:04:29 14:56:54" (Exif format)

  • "YYYY-MM-DDThh:mm:ss.ff+zz:zz

date_string :

the date in a string pointer

Returns :

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8