Sei sulla pagina 1di 6

Compare two csv files and append value, using awk

I have two files as follows: file1.csv


+------------+----------+--------+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F1 |
+------------+----------+--------+---------+
| 10000 | 20000 | 30000 | Y |
| 10001 | 20001 | 30001 | N |
| 10002 | 20002 | 30002 | Y |
| 10003 | 20003 | 30003 | N |
| 10004 | 20004 | 30004 | Y |
| 10005 | 20005 | 30005 | N |
| 10006 | 20006 | 30006 | Y |
+------------+----------+--------+---------+
file2.csv
+------------+----------+--------+---------+-----+-----+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+
| 10000 | 20000 | 30000 | Y | XYZ | ABC |
| 10001 | 20001 | 30001 | Y | XYZ | ABC |
| 10002 | 20002 | 30002 | Y | XYZ | ABC |
| 10003 | 20003 | 30003 | Y | XYZ | ABC |
| 10004 | 20004 | 30004 | Y | XYZ | ABC |
| 10005 | 20005 | 30005 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+
I am trying to get the following output:
+------------+----------+--------+---------+-----+-----+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC | FLAG_F1 |
+------------+----------+--------+---------+-----+-----+---------+
| 10000 | 20000 | 30000 | Y | XYZ | ABC | Y |
| 10001 | 20001 | 30001 | Y | XYZ | ABC | N |
| 10002 | 20002 | 30002 | Y | XYZ | ABC | Y |
| 10003 | 20003 | 30003 | Y | XYZ | ABC | N |
| 10004 | 20004 | 30004 | Y | XYZ | ABC | Y |
| 10005 | 20005 | 30005 | Y | XYZ | ABC | N |
| 10006 | 20006 | 30006 | Y | XYZ | ABC | Y |
| 10006 | 20006 | 30006 | Y | XYZ | ABC | Y |
| 10007 | 20007 | 30006 | Y | XYZ | ABC | |
| 10006 | 20003 | 30006 | Y | XYZ | ABC | |
+------------+----------+--------+---------+-----+-----+---------+
In the above output I am adding FLAG_F1 from file1.csv into the file2.csv on the condition
of Account_ID,Asset_ID, and LOT_ID values are equal on both file1.csv and file2.csv. If condition
fails, it can be blank.
I have tried the following code which is used awk by two .csv files compare using awk
awk -F',' '
FNR == NR {
if (FNR == 1) {next}
a[$1] = $2;
b[$1] = $3;
next;
}
{
if (FNR == 1) {print;next}
if (a[$1] == $2) {
print $1,$2,$3,b[$1];
}
else {
print $1,a[$1],b[$1],b[$1];
}
}
' OFS=',' file1.csv file2.csv
It's better if any one explains me the above code line by line.

Answers
This is much simpler than the linked question. All you need is:

awk -F, -v OFS=, 'NR==FNR{a[$1$2$3]=$4; next}{print $0,a[$1$2$3]}' file1 file2


Explanation
 -F, : set the input field separator to a comma.
 -v OFS=, : set the output field separator to a comma. This is useful to print comma-separated
output by default.
 NR==FNR : NR is the current line number, FNR is the line number of the current file. The two
will be identical only while the 1st file is being read.
 a[$1$2$3]=$4; next : if this is the first file (see above), save the 4th field in an array whose key
is the 1st,2nd and 3rd fields concatenated.
 print $0,a[$1$2$3] : print the current line ($0) and the value in the a array associated with the
first three fields. This is the corresponding 4th field of the first file.

Comparing two csv file fields using awk script


I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I
only need the records whose first column is same and the second column is different for the
same record in both files. Here is an example on what I need.

File1.csv: File2.csv: Output: File1.csv

RAJAK|ACTIVE|1 VIJAY|INACTIVE VIJAY|ACTIVE|2

VIJAY|ACTIVE|2 TAHA|ACTIVE

TAHA|ACTIVE|3

Above scenario I need to delete the records if col1 of File1=col2 of File2 and col1 of File1 not
equal to col2 of File2 the output should be File1 after removing the unwanted records.

awk -F"|" 'FNR==NR{++a[$1,$2];next} (a[$1])!(a[$2])' File2.csv File1.csv

Answer :

awk 'NR==FNR{a[$1]=$2;next}{if ($1 in a && a[$1] != $2)print;}' FS="|" File2 File1

VIJAY|ACTIVE|2
AWK command to compare two columns of different files and
print required columns from both files.
compare_columns_file.sh
awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 ","
date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv
compare_columns_in_file_question.txt
#Question
> I need to match strings between the two files and print to a third file. Data look like this:

#File 1
dbID labnumber myID Status
CMV_1235 LAB06 56-1 Fail
CMV_1236 LAB14 57-1 Fail
CMV_2137 LAB84 54-4 Pass
CMV_2238 LAB85 50-3
CMV_C131 LAB21 51-2 Pass

#File 2
labnumber date
LAB06 18/01/2016
LAB14 27/04/2016
LAB18 10/01/2016
LAB21 9/02/2016
LAB69 4/03/2016
LAB84 18/02/2016
LAB22 18/03/2016
LAB85 27/03/2016

(Not totally overlapping: there may be samples in file 1 but not file 2 and vice versa)

I want to print to file 3:


dbID labnumber myID Status date
CMV_1235 LAB06 56-1 Fail 18/01/2016
CMV_1236 LAB14 57-1 Fail 27/04/2016
CMV_2137 LAB84 54-4 Pass 18/02/2016
CMV_2238 LAB85 50-3 27/03/2016

So, If labnumber matches in file 1 and file 2, print all of that line in file 2 then print
relevant date from that line in file 1, into a third file

file1.csv
dbID labnumber myID Status
CMV_1235 LAB06 56-1 Fail
CMV_1236 LAB14 57-1 Fail
CMV_2137 LAB84 54-4 Pass
CMV_2238 LAB85 50-3
CMV_C131 LAB21 51-2 Pass

file2.csv

labnumber date
LAB06 18/01/2016
LAB14 27/04/2016
LAB18 10/01/2016
LAB21 9/02/2016
LAB69 4/03/2016
LAB84 18/02/2016
LAB22 18/03/2016
LAB85 27/03/2016
Answer

awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 ","


date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv

Explanation

 -F',': Sets the field separator to ,


 NR==FNR : NR is the current input line number and FNR the current file's line
number. The two will be equal only while the 1st file is being read.
 label[$1]=$1: Save column 1 (label) from the first file (argument) file2.csv in
hash-array using column 1 as the key
 date[$1]=$2: Save column 2 (date) from the first file file2.csv in hash-array using
column 1 as the key
 next: Then, skip to the next line so that this is only applied on the 1st file.
 ($2==label[$2]): the else block will only be executed if this is the second file
(file1.csv), so we check whether field 2 of this file is in array label with the
field $2 as the key ($2==label[$2]).
 {print $0 "," date[$2]}: If that's true print the entire file1.csv and append the date
column from file2.csv.
 sort -k1 file2.csv: Sorts file2.csv column 1 by setting k1. k2 sorts column 2
Comparing two files using awk
I have two files
File 1 contains 3 fields
File 2 contains 4 fields

The number of rows of File 1 is much smaller than that of File 2.


I would like to compare between two files based on 1st field with the following operation.
If the first field in any row of file 1 appears in the first field of a row in file 2, do not print that
row for file 2.

Input File 1 Input File 2 Desired Output

S13109 3739 31082 S13109 3738 31081 0 S00033 3008 29985 0


S45002 3800 31873 S13109 3737 31080 0 S00033 3007 29984 0
S43722 3313 26638 S00033 3008 29985 0 S00022 4130 31838 0
S00033 3007 29984 0 S00022 4129 31837 0
S00022 4130 31838 0 S00188 3317 27372 0
S00022 4129 31837 0 S00371 3737 33636 0
S00188 3317 27372 0 S00371 3736 33635 0
S45002 3759 31832 0
S45002 3758 31831 0
S45002 3757 31830 0
S43722 3020 26345 0
S43722 3019 26344 0
S00371 3737 33636 0
S00371 3736 33635 0

Solution
---------

awk 'FNR==NR{a[$1]++;next}!a[$1]' file1 file2

How it works:

FNR==NR
When you have two (or more) input files to awk, FNR will reset back to 1 on the first line of
the next file whereas NR will continuing incrementing from where it left off. By checking
FNR==NR we are essentially checking to see if we are currently parsing the first file.

a[$1]++

If we are parsing the first file (see above) then create an associative array with the first field
$1 as the key and post increment the value by 1. This essentially lets us create a 'seen' list.

next

This command tells awk not to process any further commands and to read in the next record
and start over. We do this because file1 is only meant to set the associative array

!a[$1]

This line only executes when FNR==NR is false, i.e. we are not parsing file1 and thus must
be parsing file2. We then use the first field $1 of file2 as the key to index into our 'seen' list
created earlier. If the value returned is 0 it means we didn't see it in file1 and therefore we
should print this line. Conversely, if the value is non-zero then we did see it in file1 and thus
we should not print its value.
Note that !a[$1] is equivalent to !a[$1]{print} because the default action when one is not
given is to print the entire line.

Potrebbero piacerti anche