You can create a custom Pipeline step using Python, which offers great flexibility in configuring and customizing the Pipeline.
To create a custom pipeline step using Python, you need to:
import sys
import json
def run(entry):
"""
Sample Python pipeline step. Searches the text field for "Voyager" or
"voyager" and returns the word count.
:param entry: a JSON file containing a voyager entry.
"""
new_entry = json.load(open(entry, "rb"))
voyager_word_count = 0
if 'fields' in new_entry['entry']:
if 'text' in new_entry['entry']['fields']:
text_field = new_entry['entry']['fields']['text']
voyager_word_count += text_field.count('Voyager')
voyager_word_count += text_field.count('voyager')
new_entry['entry']['fields']['fi_voyager_word_count'] = voyager_word_count
sys.stdout.write(json.dumps(new_entry))
sys.stdout.flush()
If the results are not as expected, the Python script can be debugged using the following steps:
1. Add the following lines of code to the top of the run(entry) function to make an entry file:
def run(entry):
"""
Sample Python pipeline step. Searches the text field for "Voyager" or
"voyager" and returns the word count.
:param entry: a JSON file containing a voyager entry.
"""
# FOR DEBUGGING ONLY - START
import shutil, os
if not os.path.exists(entry):
shutil.copyfile(entry, 'c:/temp/{0}'.format(os.path.basename(entry)))
### END
2. Save the script and re-build the index from within Voyager. This will create the entry file or files in the location you specified. It is recommended to only index a small set of data to create a small list of files that can be used to debug with.
3. Add a main function to the bottom of the script and call the run() function.
if __name__ == '__main__':
entry_file = "path to entry file here"
run(entry_file)