Fluentd grep + output logs












0














I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions



The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
Now, the fluentd.conf i use sets s3 stuff like this:



<match **>
# docs: https://docs.fluentd.org/v0.12/articles/out_s3
# note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
type copy
<store>
type s3
log_level info
s3_bucket "#{ENV['S3_BUCKET_NAME']}"
s3_region "#{ENV['S3_BUCKET_REGION']}"
aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
format json
time_slice_format %Y/%m/%d
time_slice_wait 1m
flush_interval 10m
utc
include_time_key true
include_tag_key true
buffer_chunk_limit 128m
buffer_path /var/log/fluentd-buffers/s3.buffer
</store>
<store>
...
</store>
</match>


So, what i would like to do is to have something like a grep plugin



<store>
type grep
<regexp>
key type
pattern client-action
</regexp>
</store>


Which would send logs into a separate s3 bucket to the one defined for all logs










share|improve this question





























    0














    I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
    One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions



    The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
    Now, the fluentd.conf i use sets s3 stuff like this:



    <match **>
    # docs: https://docs.fluentd.org/v0.12/articles/out_s3
    # note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
    type copy
    <store>
    type s3
    log_level info
    s3_bucket "#{ENV['S3_BUCKET_NAME']}"
    s3_region "#{ENV['S3_BUCKET_REGION']}"
    aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
    aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
    s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
    format json
    time_slice_format %Y/%m/%d
    time_slice_wait 1m
    flush_interval 10m
    utc
    include_time_key true
    include_tag_key true
    buffer_chunk_limit 128m
    buffer_path /var/log/fluentd-buffers/s3.buffer
    </store>
    <store>
    ...
    </store>
    </match>


    So, what i would like to do is to have something like a grep plugin



    <store>
    type grep
    <regexp>
    key type
    pattern client-action
    </regexp>
    </store>


    Which would send logs into a separate s3 bucket to the one defined for all logs










    share|improve this question



























      0












      0








      0







      I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
      One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions



      The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
      Now, the fluentd.conf i use sets s3 stuff like this:



      <match **>
      # docs: https://docs.fluentd.org/v0.12/articles/out_s3
      # note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
      type copy
      <store>
      type s3
      log_level info
      s3_bucket "#{ENV['S3_BUCKET_NAME']}"
      s3_region "#{ENV['S3_BUCKET_REGION']}"
      aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
      aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
      s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
      format json
      time_slice_format %Y/%m/%d
      time_slice_wait 1m
      flush_interval 10m
      utc
      include_time_key true
      include_tag_key true
      buffer_chunk_limit 128m
      buffer_path /var/log/fluentd-buffers/s3.buffer
      </store>
      <store>
      ...
      </store>
      </match>


      So, what i would like to do is to have something like a grep plugin



      <store>
      type grep
      <regexp>
      key type
      pattern client-action
      </regexp>
      </store>


      Which would send logs into a separate s3 bucket to the one defined for all logs










      share|improve this question















      I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
      One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions



      The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
      Now, the fluentd.conf i use sets s3 stuff like this:



      <match **>
      # docs: https://docs.fluentd.org/v0.12/articles/out_s3
      # note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
      type copy
      <store>
      type s3
      log_level info
      s3_bucket "#{ENV['S3_BUCKET_NAME']}"
      s3_region "#{ENV['S3_BUCKET_REGION']}"
      aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
      aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
      s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
      format json
      time_slice_format %Y/%m/%d
      time_slice_wait 1m
      flush_interval 10m
      utc
      include_time_key true
      include_tag_key true
      buffer_chunk_limit 128m
      buffer_path /var/log/fluentd-buffers/s3.buffer
      </store>
      <store>
      ...
      </store>
      </match>


      So, what i would like to do is to have something like a grep plugin



      <store>
      type grep
      <regexp>
      key type
      pattern client-action
      </regexp>
      </store>


      Which would send logs into a separate s3 bucket to the one defined for all logs







      amazon-s3 kubernetes devops fluent






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 20 '18 at 20:43







      dgmt

















      asked Nov 20 '18 at 15:16









      dgmtdgmt

      737




      737
























          1 Answer
          1






          active

          oldest

          votes


















          2














          I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
          I found your example yaml file at the official fluent github repo.
          If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.



          The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
          Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.



          so your above config turns into



          <match kubernetes.* >
          @type s3
          # user log s3 bucket
          ...


          and for system logs match every other tag except kubernetes.*






          share|improve this answer





















          • "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
            – dgmt
            Nov 20 '18 at 18:03










          • i edited the question the better reflect the problem
            – dgmt
            Nov 20 '18 at 20:43











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396095%2ffluentd-grep-output-logs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
          I found your example yaml file at the official fluent github repo.
          If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.



          The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
          Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.



          so your above config turns into



          <match kubernetes.* >
          @type s3
          # user log s3 bucket
          ...


          and for system logs match every other tag except kubernetes.*






          share|improve this answer





















          • "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
            – dgmt
            Nov 20 '18 at 18:03










          • i edited the question the better reflect the problem
            – dgmt
            Nov 20 '18 at 20:43
















          2














          I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
          I found your example yaml file at the official fluent github repo.
          If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.



          The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
          Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.



          so your above config turns into



          <match kubernetes.* >
          @type s3
          # user log s3 bucket
          ...


          and for system logs match every other tag except kubernetes.*






          share|improve this answer





















          • "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
            – dgmt
            Nov 20 '18 at 18:03










          • i edited the question the better reflect the problem
            – dgmt
            Nov 20 '18 at 20:43














          2












          2








          2






          I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
          I found your example yaml file at the official fluent github repo.
          If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.



          The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
          Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.



          so your above config turns into



          <match kubernetes.* >
          @type s3
          # user log s3 bucket
          ...


          and for system logs match every other tag except kubernetes.*






          share|improve this answer












          I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
          I found your example yaml file at the official fluent github repo.
          If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.



          The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
          Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.



          so your above config turns into



          <match kubernetes.* >
          @type s3
          # user log s3 bucket
          ...


          and for system logs match every other tag except kubernetes.*







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 17:19









          Siddhesh RaneSiddhesh Rane

          14526




          14526












          • "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
            – dgmt
            Nov 20 '18 at 18:03










          • i edited the question the better reflect the problem
            – dgmt
            Nov 20 '18 at 20:43


















          • "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
            – dgmt
            Nov 20 '18 at 18:03










          • i edited the question the better reflect the problem
            – dgmt
            Nov 20 '18 at 20:43
















          "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
          – dgmt
          Nov 20 '18 at 18:03




          "user action logs are generated by your service and system logs include docker..." well those, but not only them, i would like to have debug logs, error logs catched by code and output by structured logger to be in different bucket from record keeping actions, like user_id created resource image_id which would be gdpr friendly, and need to go into a separate bucket. so the source of both types of logs is the same (structured log library i use) but destinations would be different based on a key in the log for example
          – dgmt
          Nov 20 '18 at 18:03












          i edited the question the better reflect the problem
          – dgmt
          Nov 20 '18 at 20:43




          i edited the question the better reflect the problem
          – dgmt
          Nov 20 '18 at 20:43


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396095%2ffluentd-grep-output-logs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

          Alcedinidae

          Origin of the phrase “under your belt”?