How to force Logstash to reparse a file?
FileLogstashFile Problem Overview
I installed Logstash to parse apache files. It took me quite q while to get the settings right and I always tried on real logs. I noticed (as the documentation says) that logstash "remembers" where it was in a file. Now my setings are Ok and I would like Logstash to "forget". This seems harder than I though. I already did the following:
-
used:
start_position => "beginning"
-
deleted the complete "data" folder from elastissearch (and stopped it first)
-
looked at which files where opened by logstash with
lsof -p PID
and deleted everything which was promising (in my case/tmp/jffi*.tmp
)
Still Logstash does not forget and parse only "fresh" files in the folder where the logs are
Any ideas?
File Solutions
Solution 1 - File
By default logstash writes the position is last was on to a logfile which usually resides in $HOME/.sincedb
. Logstash can be fooled into believing it never parsed the logfile by specifying /dev/null
as sincedb_path
.
Here the part of the documentation Input File.
> Where to write the since database (keeps track of the current position > of monitored log files). Defaults to the value of environment variable > "$SINCEDB_PATH" or "$HOME/.sincedb".
Config Example
input {
file {
path => "/tmp/logfile_to_analyse"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
Solution 2 - File
The plugin file store history of "tailing" in sincedb file, default : under $HOME/.sincedb* , see http://logstash.net/docs/1.3.3/inputs/file#sincedb_path
The since db file contains line look like :
[inode] [major device number] [minor device number] [byte offset]
So, if you want to parse again a complete file, you need to :
- delete sindedb files
- OR only delete the corresponding line in sincedb file, check the inode number before of your file (
ls -i yourFile | awk '{print $1}'
) - And restart Logstash
With the key start_position => "beginning"
, Logstash will analyze all the file.
Example of a sincedb file :
- name :
.sincedb_7a7413a84171aa550d5318c17fd756e9
: the name contains sincedb_ and a MD5 (Digest::MD5.hexdigest) of all directory in key path (http://logstash.net/docs/1.3.3/inputs/file#path). See code of plugin file: https://github.com/logstash/logstash/blob/master/lib/logstash/inputs/file.rb#L105
Solution 3 - File
Logstash will keep the record in $HOME/.sincedb_*
. You can delete all the .sincedb
and restart logstash, Logstash will reparse the file.
Solution 4 - File
Combining all answers, guess this is the best way to parse files. I did the same for my testing.
input {
file {
path => "/tmp/access_log"
start_position => beginning
sincedb_path => "/dev/null"
ignore_older => 0
}
}
For a quick test, instead of ignore_older
, you can also touch /tmp/access_log
to change timestamp of the file.
Solution 5 - File
If you are using logstash-forwarder check your home for .logstash-forwarder
file instead:
{
"/var/log/messages": {
"source": "/var/log/messages",
"offset": 43715,
"inode": 12967,
"device": 51776
}
}
Solution 6 - File
After deleting $HOME/.sincedb_*
it still wasn't ingesting data for me.
After trying a bunch of things I removed all but the main .conf
file from /etc/logstash/conf.d
and restarted Logstash, and everything worked. I can only assume there was something in one of the .conf
files that logstash was silently hanging on.
Solution 7 - File
Actually reparsing each time is very costly if the file has large data in it. So you need to be careful before doing this. If we want to force it to reparse again then set the parameter inside input block
sincedb_path => "/dev/null"
This option will not be storing the .sincedb file and logstash will reparse each time. But if you want to reparse occasionaly not each time then what you can do is that delete manually the .sinceDb path which is created on parsing the file. Generally it is present in the home directory as a hidden file if you are not a root user otherwise in root directory. You can also set the sincedb_path to some other location to trace this file easily.
sincedb_path => "/home/shubham/sinceDB/productsSince.db"
Solution 8 - File
If you want to avoid messing with the logstash options I've found that renaming or removing the existing log file and creating a new file from the old file contents will trick logstash into re-indexing.
Solution 9 - File
I found it in my home dir but after deleting it, logstash refused to re-pick the existing log files. The way I got it to work was to add
sincedb_path => "/opt/elk/sincedb/"
to my file plugin. I think to reset each time, just change the path of sincedb_path
Solution 10 - File
if you use tar.gz install filebeat, you can delete this file, $FilebeatPath/data/registry/filebeat/data.json
, and rerun the filebeat
Solution 11 - File
Try by deleting /var/lib/logstash
folder in your ENV
Solution 12 - File
As seen on: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path
You can see that Logstash is going to save a sincedb file keeping track of which file it already has seen and processed till which line.
If you want to get rid of the existing sincedb file and you do not have defined the sincedb_path yourself you can find it in
<path.data>/plugins/inputs/file
By default
LOGSTASH_HOME/data
By default LOGSTASH_HOME holds the value
/var/lib/logstash
It is best to define the sincedb_path if you want to have full control of it
Solution 13 - File
I would suggest:
sincedb_clean_after => 0
start_position => "beginning"
Solution 14 - File
logstash version 5 new directory is in
<path.data>/plugins/inputs/file
path.data definition is in logstash.yml