<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>마케터의 데이터 로그</title>
    <link>https://pyj2qdat.tistory.com/</link>
    <description>데이터분석 비전공자 이지만, 배우고 쌓고 정리하는 마케터의 공부로그 입니다.</description>
    <language>ko</language>
    <pubDate>Thu, 14 May 2026 02:36:30 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>뺩빱</managingEditor>
    <image>
      <title>마케터의 데이터 로그</title>
      <url>https://tistory1.daumcdn.net/tistory/8021021/attach/9e275d4a444b46c687b7bed4ac37bc2b</url>
      <link>https://pyj2qdat.tistory.com</link>
    </image>
    <item>
      <title>7. 데이터 분석가 프로젝트 8주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/17</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;6-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/1aYQS/btsQZH3iY18/Q8MjDbcmVx435cDjfJhvO0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/1aYQS/btsQZH3iY18/Q8MjDbcmVx435cDjfJhvO0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/1aYQS/btsQZH3iY18/Q8MjDbcmVx435cDjfJhvO0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F1aYQS%2FbtsQZH3iY18%2FQ8MjDbcmVx435cDjfJhvO0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;361&quot; height=&quot;361&quot; data-filename=&quot;6-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존에 했던 관광 데이터와 새롭게 삽입된 외부의 마케팅 데이터셋이&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;프로젝트에 직접적으로 연결되지 않아 두개의 데이터를 매칭하는 방법을 모색했다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선 같은 분석 프로세스를 적용해보는 방법으로 1차 실험해보았다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 8주차 수행일지]&lt;/h2&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;1. 공통 분석 구조 찾기&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;관광 데이터에서는 누가 방문했는지 어떤 경험을 했지는지 결과를 알 수 있었고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;외부 마케팅 데이터는 누가 클릭했는지 어떤 행동을 전환했는지 알 수 있었다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;둘다 세그먼트에서 연령, 성별, 국가, 키워드와 성과지표인 방문, 전환, ROI 구조를 가지고 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2. 매칭하기&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;지금까지 한 프로젝트(관광 데이터) 외부 마케팅 데이터셋 매칭&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;방문객 수, 국가별/연령별 특성&lt;/td&gt;
&lt;td&gt;광고 노출수(Impressions), 세그먼트별 클릭&lt;/td&gt;
&lt;td&gt;&quot;규모 지표&quot; (얼마나 왔는가 vs 얼마나 봤는가)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;긍정/부정 감성 데이터&lt;/td&gt;
&lt;td&gt;전환율(CVR), 광고 반응률(CTR)&lt;/td&gt;
&lt;td&gt;&quot;경험/반응 지표&quot; (얼마나 만족했는가 vs 얼마나 전환했는가)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;세그먼트별 패턴&lt;/td&gt;
&lt;td&gt;광고 채널별 성과 (구글, 페북, 네이버 등)&lt;/td&gt;
&lt;td&gt;&quot;채널/집단별 비교&quot;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;정책/전략적 인사이트&lt;/td&gt;
&lt;td&gt;광고 캠페인 효율(ROAS, CPA)&lt;/td&gt;
&lt;td&gt;&quot;전략적 의사결정 지표&quot;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3. 실행 플로우(아직 시뮬레이션 단계)&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1154&quot; data-start=&quot;1070&quot;&gt;데이터셋 구조 파악 (광고 데이터의 기본 지표: Impressions, Clicks, Conversions, Cost, Revenue)&lt;/li&gt;
&lt;li data-end=&quot;1216&quot; data-start=&quot;1155&quot;&gt;관광 데이터와 공통 프레임워크 만들기 : 세그먼트 단위 비교: 국가, 연령, 성별, 키워드 등...&lt;/li&gt;
&lt;li data-end=&quot;1293&quot; data-start=&quot;1217&quot;&gt;성과 지표 산출 (CTR, CVR, ROAS, CPA 계산 &amp;rarr; 관광 프로젝트에서 사용한 &amp;lsquo;긍정률&amp;middot;재방문률&amp;rsquo;과 같은 역할)&lt;/li&gt;
&lt;li data-end=&quot;1337&quot; data-start=&quot;1294&quot;&gt;세그먼트 분석 &amp;amp; 시각화 : 어떤 집단/채널이 효율이 좋은가?&lt;/li&gt;
&lt;li data-end=&quot;1473&quot; data-start=&quot;1338&quot;&gt;인사이트 연결
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1473&quot; data-start=&quot;1358&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1387&quot; data-start=&quot;1358&quot;&gt;관광 데이터: 어떤 집단이 방문&amp;middot;만족했는가&lt;/li&gt;
&lt;li data-end=&quot;1473&quot; data-start=&quot;1391&quot;&gt;광고 데이터: 어떤 집단/채널이 효율적으로 반응했는가&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-end=&quot;607&quot; data-start=&quot;592&quot; data-ke-size=&quot;size20&quot;&gt;4. 데이터셋 현황&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;844&quot; data-start=&quot;608&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;686&quot; data-start=&quot;608&quot;&gt;관광 데이터 (내부 구축):
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;686&quot; data-start=&quot;634&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;654&quot; data-start=&quot;634&quot;&gt;국가별&amp;middot;성별&amp;middot;연령별 방문객 수&lt;/li&gt;
&lt;li data-end=&quot;672&quot; data-start=&quot;657&quot;&gt;긍정/부정 감성 비율&lt;/li&gt;
&lt;li data-end=&quot;686&quot; data-start=&quot;675&quot;&gt;방문 목적 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;844&quot; data-start=&quot;688&quot;&gt;외부 마케팅 데이터:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;844&quot; data-start=&quot;722&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;796&quot; data-start=&quot;722&quot;&gt;국가, 성별, 연령, 채널, 캠페인별 노출&amp;middot;클릭&amp;middot;전환&amp;middot;비용 (Advertising.csv, marketing_AB.csv 등)&lt;/li&gt;
&lt;li data-end=&quot;844&quot; data-start=&quot;799&quot;&gt;일부 데이터는 Synthetic Data 기반 (실제와 유사한 분포 구조)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-end=&quot;864&quot; data-start=&quot;851&quot; data-ke-size=&quot;size20&quot;&gt;5. 수행 과정&lt;/h4&gt;
&lt;p data-end=&quot;885&quot; data-start=&quot;866&quot; data-ke-size=&quot;size18&quot;&gt;(1) 공통 스키마 정의&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1020&quot; data-start=&quot;886&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;916&quot; data-start=&quot;886&quot;&gt;두 데이터셋은 구조가 달라 직접 합치기 어렵다.&lt;/li&gt;
&lt;li data-end=&quot;1020&quot; data-start=&quot;917&quot;&gt;그래서 공통 키(country, gender, age_group)를 설정하고, 광고 데이터의 나이를 관광 데이터의 연령 구간(예: 20-29세)에 맞게 변환하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1037&quot; data-start=&quot;1022&quot; data-ke-size=&quot;size18&quot;&gt;(2) 매칭 로직&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1203&quot; data-start=&quot;1038&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1102&quot; data-start=&quot;1038&quot;&gt;광고 데이터 성과지표: CTR(클릭률), CVR(전환율), CPA(전환당 비용), ROAS(광고수익률) 계산&lt;/li&gt;
&lt;li data-end=&quot;1135&quot; data-start=&quot;1103&quot;&gt;관광 데이터 지표: 방문객 수, 긍&amp;middot;부정 감성 비율&lt;/li&gt;
&lt;li data-end=&quot;1203&quot; data-start=&quot;1136&quot;&gt;두 데이터를 공통 키로 Join하여, 세그먼트별 &amp;ldquo;방문 규모 &amp;times; 광고 성과&amp;rdquo;를 한눈에 비교할 수 있게 정리&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1227&quot; data-start=&quot;1205&quot; data-ke-size=&quot;size18&quot;&gt;(3) 점수화(우선순위 산출)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1322&quot; data-start=&quot;1228&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1244&quot; data-start=&quot;1228&quot;&gt;방문객 수(시장 규모)&lt;/li&gt;
&lt;li data-end=&quot;1259&quot; data-start=&quot;1245&quot;&gt;CVR(전환 성향)&lt;/li&gt;
&lt;li data-end=&quot;1322&quot; data-start=&quot;1260&quot;&gt;ROAS/CPA(비용 효율)&lt;/li&gt;
&lt;li data-end=&quot;1322&quot; data-start=&quot;1260&quot;&gt;이 3가지를 가중 평균하여 우선 공략 점수(score)를 계산&lt;/li&gt;
&lt;/ul&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;1728&quot; data-start=&quot;1344&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li style=&quot;list-style-type: none;&quot; data-end=&quot;1728&quot; data-start=&quot;1643&quot;&gt;&amp;nbsp;&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 data-end=&quot;1751&quot; data-start=&quot;1735&quot; data-ke-size=&quot;size20&quot;&gt;6. 어려움 &amp;amp; 해결&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1917&quot; data-start=&quot;1752&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1807&quot; data-start=&quot;1752&quot;&gt;&lt;b&gt;문제:&lt;/b&gt; 관광 데이터와 광고 데이터의 연령대&amp;middot;국가명이 일치하지 않아 매칭에 어려움 발생&lt;/li&gt;
&lt;li data-end=&quot;1917&quot; data-start=&quot;1808&quot;&gt;&lt;b&gt;해결:&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1917&quot; data-start=&quot;1822&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1857&quot; data-start=&quot;1822&quot;&gt;광고 나이를 구간화(cut)하여 관광 데이터 구간과 맞춤&lt;/li&gt;
&lt;li data-end=&quot;1917&quot; data-start=&quot;1860&quot;&gt;국가명 매핑 테이블 생성 (예: &amp;ldquo;United States of America&amp;rdquo; &amp;rarr; &amp;ldquo;USA&amp;rdquo;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-end=&quot;2194&quot; data-start=&quot;2174&quot; data-ke-size=&quot;size20&quot;&gt;7. 다음 주 계획&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2308&quot; data-start=&quot;2195&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2220&quot; data-start=&quot;2195&quot;&gt;SEM(검색광고) 시뮬레이션 실험 진행&lt;/li&gt;
&lt;li data-end=&quot;2268&quot; data-start=&quot;2221&quot;&gt;Synthetic Data 기반으로 광고비 증감 시 전환율&amp;middot;ROI 변화를 추정&lt;/li&gt;
&lt;li data-end=&quot;2308&quot; data-start=&quot;2269&quot;&gt;최종적으로 &lt;b&gt;&amp;ldquo;&lt;/b&gt;데이터 기반 관광 마케팅 전략 보고서&lt;b&gt;&amp;rdquo;&lt;/b&gt; 완성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/17</guid>
      <comments>https://pyj2qdat.tistory.com/17#entry17comment</comments>
      <pubDate>Thu, 2 Oct 2025 12:23:37 +0900</pubDate>
    </item>
    <item>
      <title>6. 데이터 분석가 프로젝트 7주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/16</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;6-001 (1).png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Yl80D/btsQN9e2zNj/iSQ9K660liZ5K7JspI5Jy1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Yl80D/btsQN9e2zNj/iSQ9K660liZ5K7JspI5Jy1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Yl80D/btsQN9e2zNj/iSQ9K660liZ5K7JspI5Jy1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FYl80D%2FbtsQN9e2zNj%2FiSQ9K660liZ5K7JspI5Jy1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;416&quot; height=&quot;416&quot; data-filename=&quot;6-001 (1).png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;이번 주에는 최종 프로젝트를&amp;nbsp; 심화하기 위해,&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;분석 중간 결과를 정리하고 슬라이드를 만들고 중간 발표 준비를 했다.&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 7주차 수행일지]&lt;/h2&gt;
&lt;h4 data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size20&quot;&gt;&amp;nbsp;&lt;/h4&gt;
&lt;h4 data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size20&quot;&gt;슬라이드 제작&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터를 기반으로 한 분석 과정을 구조화하고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;중간까지의 성과를 발표용 자료로 다듬었다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;문제 정의 &amp;rarr; 데이터 소개 &amp;rarr; 분석 프로세스 &amp;rarr; 중간 인사이트 &amp;rarr; 확장 방향까지&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하나의 흐름으로 연결했다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;확장 계획 수립&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그동안 외부에 있는 A/B 테스트 시뮬레이션을 벤치마킹해서&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;현재 프로젝트와 연결된 점없이 실무 위주로 진행했다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;앞으로는 이 외부 시뮬레이션와 본 프로젝트를 연결된 점을 찾고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A/B 테스트 시뮬레이션을 통해 실제 마케팅 메시지 효과를 검증할 예정이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;예를 들어, 20대에게는 K-팝 중심의 SNS 캠페인을,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;30~40대에게는 가족 친화형 패키지 광고를,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;50대 이상에게는 안정성과 편안함을 강조한 메시지를 실험할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또 장기적으로는 MMM(미디어 믹스 모델링)을 도입해 채널별 ROI를 분석하고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SEO &amp;amp; 검색광고 시뮬레이션으로 전략을 더 구체화할 계획이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;다음 주에는 본격적으로 캠페인 시뮬레이션과 ROI 분석을 진행해,&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;더 구체적이고 실행 가능한 전략을 제시할 예정이다.&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/16</guid>
      <comments>https://pyj2qdat.tistory.com/16#entry16comment</comments>
      <pubDate>Fri, 26 Sep 2025 10:08:29 +0900</pubDate>
    </item>
    <item>
      <title>5. 데이터 분석가 프로젝트 6주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/15</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;6-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/urYTQ/btsQFpodZcT/uStidy1cnOONkql0sasJg1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/urYTQ/btsQFpodZcT/uStidy1cnOONkql0sasJg1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/urYTQ/btsQFpodZcT/uStidy1cnOONkql0sasJg1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FurYTQ%2FbtsQFpodZcT%2FuStidy1cnOONkql0sasJg1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;553&quot; height=&quot;553&quot; data-filename=&quot;6-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;이번 주에는 광고 캠페인 데이터 분석 프로세스 시뮬레이션을 했다.&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;실제 캠페인 데이터가 없어서 Kaggle 공개 데이터셋을 활용해&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;광고 캠페인 성과 분석 프로세스를 시뮬레이션 했다.&lt;/p&gt;
&lt;p data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실제 캠페인 데이터가 없더라도, 동일한 분석 과정을 연습해두면&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실무에서는 데이터만 교체해 바로 적용할 수 있다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 6주차 수행일지]&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size20&quot;&gt;1. 데이터 불러오기 &amp;amp; 기본 확인&lt;/h4&gt;
&lt;pre id=&quot;code_1758252527503&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import pandas as pd

df = pd.read_csv(&quot;/content/KAG_conversion_data.csv&quot;)
df.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size20&quot;&gt;2. 전환 퍼널 지표 계산 (CTR, CVR, CAC)&lt;/h4&gt;
&lt;pre id=&quot;code_1758252578063&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;df[&quot;CTR&quot;] = df[&quot;Clicks&quot;] / df[&quot;Impressions&quot;]
df[&quot;CVR&quot;] = df[&quot;Conversions&quot;] / df[&quot;Clicks&quot;]
df[&quot;CAC&quot;] = df[&quot;Total_Spend&quot;] / df[&quot;Conversions&quot;]

df[[&quot;Ad_ID&quot;,&quot;CTR&quot;,&quot;CVR&quot;,&quot;CAC&quot;]].head()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;CTR (Click-Through Rate): 광고 노출 대비 클릭률&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;CVR (Conversion Rate): 클릭 대비 전환율&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;CAC (Customer Acquisition Cost): 한 고객을 유치하는데 든 비용&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이렇게 계산하면 어떤 광고가 가장 효율적인지 한눈에 비교할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size20&quot;&gt;3. 세그먼트별 효율 분석&lt;/h4&gt;
&lt;pre id=&quot;code_1758252636763&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;seg = df.groupby(&quot;Age&quot;)[[&quot;CTR&quot;,&quot;CVR&quot;,&quot;CAC&quot;]].mean().reset_index()
seg&lt;/code&gt;&lt;/pre&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;예시 결과 (Kaggle 데이터 기준):&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;특정 연령대/성별에서 CTR과 CVR이 높고 CAC가 낮음 &amp;rarr; 효율적인 타깃&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;반대로 CTR은 높은데 CVR이 낮으면 &amp;rarr; 클릭만 많고 구매로 이어지지 않는 타깃&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실제 캠페인에서는 연령&amp;middot;성별&amp;middot;국가&amp;middot;플랫폼별 세그먼트 분석으로 &amp;ldquo;ROI가 높은 고객군&amp;rdquo;을 찾는다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4. A/B 테스트 시뮬레이션&lt;/h4&gt;
&lt;pre id=&quot;code_1758252693217&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from statsmodels.stats.proportion import proportions_ztest

# 가상 A/B 안 (전환수/노출수)
conv_A, imp_A = 950, 50000
conv_B, imp_B = 1100, 50000

z, p = proportions_ztest([conv_A, conv_B], [imp_A, imp_B])
cr_A, cr_B = conv_A/imp_A, conv_B/imp_B
lift = (cr_B - cr_A)/cr_A

print(f&quot;CR_A={cr_A:.3%}, CR_B={cr_B:.3%}, Lift={lift:.1%}, p={p:.4f}&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;A안: 기존 메시지&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;B안: 인플루언서&amp;middot;SNS 강조 메시지&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;판정 기준: p &amp;lt; 0.05 &amp;amp; Lift &amp;ge; +10% &amp;rarr; B안 성공&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실제 캠페인에서는 이 코드에 실제 전환수와 노출수만 입력하면 바로 검정 결과를 확인할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imagegridblock&quot;&gt;
  &lt;div class=&quot;image-container&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/43SyE/btsQGXEhm8n/bP3P5KxAJeBvVHKvMIYA1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/43SyE/btsQGXEhm8n/bP3P5KxAJeBvVHKvMIYA1K/img.png&quot; data-origin-width=&quot;706&quot; data-origin-height=&quot;391&quot; data-is-animation=&quot;false&quot; style=&quot;width: 50.1014%; margin-right: 10px;&quot; data-widthpercent=&quot;50.69&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/43SyE/btsQGXEhm8n/bP3P5KxAJeBvVHKvMIYA1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F43SyE%2FbtsQGXEhm8n%2FbP3P5KxAJeBvVHKvMIYA1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;706&quot; height=&quot;391&quot;/&gt;&lt;/span&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vyS9g/btsQDY6gvK5/TKYM28wgYslQP2ZMdCA3z0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vyS9g/btsQDY6gvK5/TKYM28wgYslQP2ZMdCA3z0/img.png&quot; data-origin-width=&quot;685&quot; data-origin-height=&quot;390&quot; data-is-animation=&quot;false&quot; style=&quot;width: 48.7358%;&quot; data-widthpercent=&quot;49.31&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vyS9g/btsQDY6gvK5/TKYM28wgYslQP2ZMdCA3z0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvyS9g%2FbtsQDY6gvK5%2FTKYM28wgYslQP2ZMdCA3z0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;685&quot; height=&quot;390&quot;/&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;5. 인사이트&lt;/h4&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;이번 Kaggle 데이터 시뮬레이션에서는&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;연령별로 효율 차이가 뚜렷하게 나타났다.&lt;/p&gt;
&lt;p data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A/B 테스트도 적용 가능한 구조를 만들었고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어떤 메시지가 더 전환율을 끌어올리는지 검증 가능하다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실제 브랜드 캠페인에서는 데이터를 교체만 하면&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;동일한 분석 프로세스로&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ROI 높은 세그먼트 발굴 + 메시지 최적화를 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-pm-slice=&quot;0 0 []&quot; data-ke-size=&quot;size20&quot;&gt;7주차 계획&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음 단계(7주차)에서는 검색광고(SEM) 시뮬레이션을 진행.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;입력값: 키워드, 월간 검색량, CPC, 경쟁도&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;산출값: 예상 노출수, 클릭수, 전환수, 비용 (예산 한도 내)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;활용: 예산 대비 효율이 높은 키워드를 선별하여 광고 집행 전략 최적화&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;중장기적으로는 SEO 전략까지 확장해,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;광고와 자연검색 트래픽을 통합적으로 관리하는 방향까지 연결할 예정.&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/15</guid>
      <comments>https://pyj2qdat.tistory.com/15#entry15comment</comments>
      <pubDate>Fri, 19 Sep 2025 12:37:32 +0900</pubDate>
    </item>
    <item>
      <title>4. 데이터 분석가 프로젝트 5주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/14</link>
      <description>&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/6d6RR/btsQvYYhk3u/qgSUbKH0pa49hXN73XHAmk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/6d6RR/btsQvYYhk3u/qgSUbKH0pa49hXN73XHAmk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/6d6RR/btsQvYYhk3u/qgSUbKH0pa49hXN73XHAmk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F6d6RR%2FbtsQvYYhk3u%2FqgSUbKH0pa49hXN73XHAmk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;495&quot; height=&quot;495&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4주차에는 광고 예산과 인력이 무한하지 않다는 점을 고려하여,&lt;br&gt;우선 가장 효율이 높은 고객 세그먼트를 선별해야 한다고 판단했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;이에 따라 5주차에는 데이터를 기반으로 분석을 진행해,&lt;br&gt;어떤 그룹을 우선 공략했을 때 ROI가 가장 높을지를 도출했다.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot;&gt;&lt;h2 style=&quot;text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 5주차 수행일지]&lt;/h2&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. 데이터 준비&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd
df = pd.read_csv(&quot;tourist_data.csv&quot;)
df.head()&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;데이터 출처: 외래관광객 조사 2분기 잠정치&lt;br&gt;주요 변수: 점유율, 성장률, 만족도, 방문객 규모&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. ROI 점수 산출&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[['점유율_n','성장률_n','만족도_n','방문객규모_n']] = scaler.fit_transform(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df[['점유율','성장률','만족도','방문객규모']]
)

df['ROI_score'] = df[['점유율_n','성장률_n','만족도_n','방문객규모_n']].mean(axis=1)
df[['세그먼트','ROI_score']].sort_values(by='ROI_score', ascending=False).head()&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;네 가지 지표를 정규화 후 평균하여 ROI Score로 정의&lt;br&gt;결과: 여성 청년층(친구·커플), 여성 30~40대, 가족 동반이 Top3&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. 클러스터링(KMeans)&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.cluster import KMeans
X = df[['점유율_n','성장률_n','만족도_n','방문객규모_n']]
km = KMeans(n_clusters=2, random_state=42)
df['cluster'] = km.fit_predict(X)
df.groupby('cluster').mean()&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;데이터 기반으로 행동 그룹을 도출 (Cluster 0: 가족 중심, Cluster 1: 여성 중심)&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. 시각화&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import matplotlib.pyplot as plt

plt.scatter(df['성장률'], df['만족도'], 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s=df['방문객규모']/50, 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;c=df['ROI_score'], cmap='coolwarm', alpha=0.7)
plt.xlabel(&quot;성장률&quot;)
plt.ylabel(&quot;만족도&quot;)
plt.title(&quot;세그먼트별 성장률 vs 만족도 (버블=규모, 색=ROI)&quot;)
plt.colorbar(label=&quot;ROI Score&quot;)
plt.show()&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;버블차트: 성장률 vs 만족도, 버블 크기=방문객 규모, 색=ROI Score&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;785&quot; data-origin-height=&quot;590&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dheiwX/btsQuhxta8B/e9bEb6jLdhxaMQzA4tDZU0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dheiwX/btsQuhxta8B/e9bEb6jLdhxaMQzA4tDZU0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dheiwX/btsQuhxta8B/e9bEb6jLdhxaMQzA4tDZU0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdheiwX%2FbtsQuhxta8B%2Fe9bEb6jLdhxaMQzA4tDZU0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;518&quot; height=&quot;389&quot; data-origin-width=&quot;785&quot; data-origin-height=&quot;590&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Top ROI 세그먼트 (버블 크고, 색 진하게 나온 그룹)&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;여성 청년층 (친구·커플 여행)&lt;br&gt;성장률↑, 만족도↑, 방문객 규모↑ → ROI Score 가장 높음&lt;br&gt;여성 30~40대 (자녀 동반 가능)&lt;br&gt;체류기간·지출 여력 높음, 성장률도 안정적 → ROI 상위권&lt;br&gt;가족 동반 (청소년 포함)&lt;br&gt;방문객 규모 크고 만족도도 준수 → ROI 점수 상위&lt;/p&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;ROI 중간 세그먼트&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;일부 여성 기타 그룹: 규모는 있지만 성장률/만족도가 중간 수준&lt;br&gt;여성 청년(커플 외 단체): 반응은 있으나 ROI Score은 상위권 대비 다소 낮음&lt;/p&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;ROI 낮은 세그먼트&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;액티브 시니어 남성: 방문객 규모가 작아 ROI Score 낮음&lt;br&gt;기타 소수 그룹: 성장률·만족도 모두 낮아 우선순위에서 제외&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;5. 마케팅 인사이트 도출&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;여성 청년층(친구·커플) → SNS·인플루언서 / K-뷰티·패션 연계&lt;br&gt;여성 30~40대 (자녀 동반 가능) → 쇼핑·패밀리 패키지 상품&lt;br&gt;가족 동반 청소년 포함 → 체험형 패밀리 콘텐츠 / 여행 패키지&lt;br&gt;남성(액티브 시니어) → 레저·스포츠·역사 체험 패키지&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;6. A/B 테스트 설계 &amp;amp; 시뮬레이션&lt;/h3&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;1. 가설 설정&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;대상: 여성 청년층(친구·커플)&lt;br&gt;A안: 기존 메시지/광고&lt;br&gt;B안: 인플루언서·SNS 강조 메시지&lt;br&gt;목표: 전환율 +10% 이상 개선, p&amp;lt;0.05 유의&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;2. 표본 수 계산 (시뮬레이션)&lt;/h4&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.api import proportion_effectsize

p1 = 0.020&amp;nbsp;&amp;nbsp; # 현재 전환율 (2%)
mde = 0.10&amp;nbsp;&amp;nbsp; # +10% 개선 목표
p2 = p1 * (1 + mde)

es = proportion_effectsize(p1, p2)
n = NormalIndPower().solve_power(effect_size=es, alpha=0.05, power=0.8)
print(&quot;그룹당 최소 노출수:&quot;, int(n))&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;→ 결과: 그룹당 최소 노출수 80637 (시뮬레이션 기준)&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;3. 가상 데이터로 검정 (시뮬레이션)&lt;/h4&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from statsmodels.stats.proportion import proportions_ztest

conv_A, imp_A = 950, 50000&amp;nbsp;&amp;nbsp; # A안 전환수/노출수
conv_B, imp_B = 1100, 50000&amp;nbsp;&amp;nbsp;# B안 전환수/노출수

z, p = proportions_ztest([conv_A, conv_B], [imp_A, imp_B])
cr_A, cr_B = conv_A/imp_A, conv_B/imp_B
lift = (cr_B-cr_A)/cr_A

print(f&quot;CR_A={cr_A:.2%}, CR_B={cr_B:.2%}, Lift={lift:.1%}, p={p:.4f}&quot;)&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;결과 (시뮬레이션): CR_A=1.9%, CR_B=2.2%, Lift=+15%, p&amp;lt;0.05&lt;br&gt;→ 메시지 B가 더 효과적일 가능성을 확인&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;6주차 계획&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;5주차에서 우선 공략할 핵심 그룹을 도출했으므로,&lt;br&gt;6주차에는 이를 기반으로 실제 캠페인 상황을 가정한&lt;br&gt;A/B 테스트 시뮬레이션을 진행하여,&lt;br&gt;실행 가능성과 기대 효과를 검증할 예정이다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;p data-ke-size=&quot;size18&quot;&gt;주요&amp;nbsp;계획&lt;/p&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;• 가상&amp;nbsp;캠페인&amp;nbsp;로그&amp;nbsp;생성:&amp;nbsp;세그먼트별(A/B)&amp;nbsp;노출·전환·비용·매출&amp;nbsp;데이터&amp;nbsp;준비&lt;br&gt;• 통계&amp;nbsp;검정:&amp;nbsp;전환율&amp;nbsp;차이에&amp;nbsp;대한&amp;nbsp;z-test,&amp;nbsp;Lift%,&amp;nbsp;p-value&amp;nbsp;산출&lt;br&gt;• 효율&amp;nbsp;분석:&amp;nbsp;CAC/ROAS&amp;nbsp;계산으로&amp;nbsp;비용&amp;nbsp;대비&amp;nbsp;효과&amp;nbsp;평가&lt;br&gt;• 의사결정 규칙 적용: p&amp;lt;0.05 &amp;amp; Lift≥+10% 충족 시 채택, 그렇지 않으면 재실험/메시지 조정&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;추가 시뮬레이션&lt;/h4&gt;&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;br&gt;SEO &amp;amp; 검색광고 시뮬레이션 (고려)&lt;br&gt;SEM (검색광고)&lt;/p&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;입력: 키워드, 월간검색량, CPC, 경쟁도&lt;br&gt;산출: 예상 노출수, 클릭수, 전환수, 비용 (예산 한도 내)&lt;br&gt;활용: 예산 대비 효율 키워드 선별&lt;/p&gt;&lt;p data-ke-size=&quot;size18&quot;&gt;SEO (검색최적화)&lt;/p&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;입력: 키워드 난이도(kd), 페이지 점수(콘텐츠/백링크/기술)&lt;br&gt;산출: 기대순위(rank_exp), Top10/Top3 진입 확률, 예상 클릭수/전환수&lt;br&gt;활용: 중장기적으로 SEO 투자 가치 있는 키워드 식별&lt;br&gt;&lt;br&gt;&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/14</guid>
      <comments>https://pyj2qdat.tistory.com/14#entry14comment</comments>
      <pubDate>Fri, 12 Sep 2025 11:00:28 +0900</pubDate>
    </item>
    <item>
      <title>3. 데이터 분석가 프로젝트 4주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/13</link>
      <description>&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/YdfEF/btsQkeuoIg9/GS7SDPNvKTvJLZ5DKQZW1k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/YdfEF/btsQkeuoIg9/GS7SDPNvKTvJLZ5DKQZW1k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/YdfEF/btsQkeuoIg9/GS7SDPNvKTvJLZ5DKQZW1k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FYdfEF%2FbtsQkeuoIg9%2FGS7SDPNvKTvJLZ5DKQZW1k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;497&quot; height=&quot;497&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;3주차에는 시장 점유율 분석을 통해&lt;br&gt;20~30대 여성과 60대 남성이라는 &lt;br&gt;핵심 세그먼트를 도출했고,&lt;br&gt;&lt;br&gt;4주차에는 규모와 성장률을 반영해 &lt;br&gt;우선순위를 점수화함으로써&lt;br&gt;우선 공략해야 할 세그먼트를 명확히 알 수 있었다.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot;&gt;&lt;h2 style=&quot;text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 4주차 수행일지]&lt;/h2&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. 월별 세그먼트 피처테이블 확정&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# ① 방문객 원천 불러와 월별 세그먼트 집계
visit_raw = load_csv(RAW_VISIT, encoding='cp949')
visit_m = make_visit_month(visit_raw)&amp;nbsp;&amp;nbsp;# ['월','성별','연령별','목적별','방문객']

# ② 2023/2024 연도별 세그먼트 합계, 점유율·성장률 계산
visit_m['연도'] = pd.to_datetime(visit_m['월']).dt.year
seg_year = (visit_m.groupby(['연도','성별','연령별','목적별'], as_index=False)['방문객']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .sum().rename(columns={'방문객':'연간방문'}))

tot_by_year = seg_year.groupby('연도')['연간방문'].sum().rename('연도합')
seg_year = seg_year.merge(tot_by_year, on='연도')
seg_year['연도점유율'] = seg_year['연간방문']/seg_year['연도합']

# 2023↔2024 피벗으로 YoY
yoy = (seg_year.pivot(index=['성별','연령별','목적별'], columns='연도', values='연간방문')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.rename_axis(None, axis=1).reset_index())

# 안전한 성장률 (분모 0 방지)
eps = 1e-9
yoy['성장률_YoY'] = (yoy.get(2024,0) - yoy.get(2023,0)) / (yoy.get(2023,0)+eps)

# 2024 점유율
share24 = (seg_year[seg_year['연도']==2024]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .loc[:,['성별','연령별','목적별','연도점유율']]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .rename(columns={'연도점유율':'점유율_2024'}))

seg_base = (yoy.merge(share24, on=['성별','연령별','목적별'], how='left')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.rename(columns={2023:'방문_2023', 2024:'방문_2024'}))
seg_base[['방문_2023','방문_2024','점유율_2024','성장률_YoY']].fillna(0, inplace=True)

# 저장
save_parquet(visit_m, DATA/'interim'/'visit_month.parquet')
save_parquet(seg_base, DATA/'processed'/'segment_base.parquet')
print(&quot;✅ 세그먼트 피처테이블 완료:&quot;, seg_base.shape)
display(seg_base.sort_values('점유율_2024', ascending=False).head(10))&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;653&quot; data-origin-height=&quot;390&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/o7ezg/btsQkfz0Yak/K0bJKdqvxr05yfyJrNkBcK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/o7ezg/btsQkfz0Yak/K0bJKdqvxr05yfyJrNkBcK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/o7ezg/btsQkfz0Yak/K0bJKdqvxr05yfyJrNkBcK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fo7ezg%2FbtsQkfz0Yak%2FK0bJKdqvxr05yfyJrNkBcK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;653&quot; height=&quot;390&quot; data-origin-width=&quot;653&quot; data-origin-height=&quot;390&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;성별·연령·방문 목적별로 데이터를 세분화해&lt;br&gt;2023년과 2024년 방문객 수를 비교했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그 결과 여성 20~30대 관광객의 비중이 특히 높고,&lt;br&gt;연간 성장률도 뚜렷하게 나타났다.&lt;br&gt;&amp;nbsp;&lt;br&gt;여성 21~30세 관광객은 전년 대비 63% 증가하며&lt;br&gt;2024년 전체 방문객의 15.4%를 차지했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;→ &lt;u&gt;타깃 세그먼트의 시장 성장 잠재력 확인&lt;/u&gt;&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. 베이스라인 예측 백테스트&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd

def seasonal_naive_backtest(visit_m: pd.DataFrame, s: int = 6, test_year: int = 2024):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;&quot;&quot;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;visit_m: ['월','성별','연령별','목적별','방문객'] 포함, 월말(datetime64[ns]) 권장
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s: 시즌널 시차(기본 6개월)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;test_year: 평가 연도(기본 2024)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;&quot;&quot;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = visit_m.copy()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 월 빈도 정렬/정규화
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = df.sort_values('월')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;results = []

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for (gender, age, purpose), g in df.groupby(['성별','연령별','목적별'], dropna=False):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 월별 인덱스 정렬 및 결측 월 보존
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;g = g[['월','방문객']].set_index('월').sort_index()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;g = g.asfreq('M')&amp;nbsp;&amp;nbsp;# 월말 기준 인덱스 맞춤(예: 2024-01-31)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 시즌널-나이브 예측
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;g['yhat'] = g['방문객'].shift(s)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 테스트 마스크: (해당 연도) &amp;amp; (시차 존재)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mask = (g.index.year == test_year) &amp;amp; g['yhat'].notna() &amp;amp; g['방문객'].notna()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;y = g.loc[mask, '방문객']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;yhat = g.loc[mask, 'yhat']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;n = len(y)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# MAPE 계산(0 나눗셈 방지)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if n &amp;gt; 0:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;denom = y.replace(0, np.nan)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mape = ((y - yhat).abs() / denom).mean() * 100
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;else:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mape = np.nan

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;results.append({
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'성별': gender, '연령별': age, '목적별': purpose,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;f'MAPE_{test_year}': float(mape) if pd.notna(mape) else np.nan,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'n_test': int(n)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;})

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;out = pd.DataFrame(results).sort_values([f'MAPE_{test_year}','n_test'], ascending=[True, False])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return out

# 사용 예시: visit_m 는 ['월','성별','연령별','목적별','방문객']가 있는 월집계 테이블
bt = seasonal_naive_backtest(visit_m, s=6, test_year=2024)
print(&quot;✅ 백테스트 완료:&quot;, bt.shape)
bt.head(15)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;401&quot; data-origin-height=&quot;556&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/tFgHz/btsQkTwKy2r/mGzDAdRPhZcAZeoD70qgVK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/tFgHz/btsQkTwKy2r/mGzDAdRPhZcAZeoD70qgVK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/tFgHz/btsQkTwKy2r/mGzDAdRPhZcAZeoD70qgVK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FtFgHz%2FbtsQkTwKy2r%2FmGzDAdRPhZcAZeoD70qgVK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;401&quot; height=&quot;556&quot; data-origin-width=&quot;401&quot; data-origin-height=&quot;556&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;6개월 시차를 둔 시즌널 나이브 모델로&lt;br&gt;2024년 방문객 수를 예측하고 실제값과 비교했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;예측 오차율를 기준으로 보았을 때,&lt;br&gt;일부 그룹은 10~12% 수준의 안정적인 예측 정확도를 보였다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. 감성-방문 리드/래그 상관분석&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd
import numpy as np

# 파일 경로
PATH_SENTI&amp;nbsp;&amp;nbsp; = &quot;/content/drive/MyDrive/데이터분석/외래객방한데이터(한국관광공사)/20250813164536_한국 관광 관련 긍부정 점유율 추이.csv&quot;
PATH_MENTION = &quot;/content/drive/MyDrive/데이터분석/외래객방한데이터(한국관광공사)/20250813164522_한국 관광 관련 언급량 인게이지먼트 추이 언급량.csv&quot;

# 1) 감성(글로벌) 월 정규화
senti = pd.read_csv(PATH_SENTI)
# 기준년월이 202408 같은 int면 문자열로 바꿔서 파싱
senti['월'] = pd.to_datetime(senti['기준년월'].astype(str), format='%Y%m', errors='coerce')\
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .dt.to_period('M').dt.to_timestamp('M')

# '국가' 컬럼이 있으면 글로벌만 사용
if '국가' in senti.columns:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;senti = senti[senti['국가'].astype(str).str.contains('글로벌', na=False)]

senti_m = senti.groupby('월', as_index=False)[['긍정','부정']].mean()
senti_m['폴라리티'] = senti_m['긍정'] - senti_m['부정']

print(&quot;감성 월 범위:&quot;, senti_m['월'].min(), &quot;~&quot;, senti_m['월'].max(), &quot;| rows:&quot;, len(senti_m))

# 2) 언급량 월 정규화
men = pd.read_csv(PATH_MENTION)

# 이미 men['월']이 있다면 그대로 쓰고, 없으면 기준년월로 생성
if '월' not in men.columns:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;men['월'] = pd.to_datetime(men['기준년월'].astype(str), format='%Y%m', errors='coerce')\
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.dt.to_period('M').dt.to_timestamp('M')

# 월 단위 합산 (국가/채널 축이 있으면 월 기준 총합)
mention_m = men.groupby('월', as_index=False)[['언급량']].sum()

print(&quot;언급량 월 범위:&quot;, mention_m['월'].min(), &quot;~&quot;, mention_m['월'].max(), &quot;| rows:&quot;, len(mention_m))

# 3) 교집합 확인 &amp;amp; 병합 
df = mention_m.merge(senti_m[['월','폴라리티']], on='월', how='inner').sort_values('월')
print(&quot;병합 후 rows:&quot;, len(df))
print(&quot;교집합 월:&quot;, df['월'].dt.strftime('%Y-%m').tolist())

# 4) 리드/래그 상관 
def leadlag_corr(df, y='언급량', x='폴라리티', max_lag=3):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;rows=[]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for k in range(0, max_lag+1):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tmp = df[[y,x]].copy()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tmp[f'{x}_lag{k}'] = tmp[x].shift(k)&amp;nbsp;&amp;nbsp;# 감성이 k개월 앞서는지 확인
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tmp = tmp[[y, f'{x}_lag{k}']].dropna()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;corr = tmp[y].corr(tmp[f'{x}_lag{k}']) if len(tmp)&amp;gt;=3 else np.nan
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;rows.append({'lag_months':k, 'corr':corr, 'n':len(tmp)})
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return pd.DataFrame(rows)

corr_tbl = leadlag_corr(df, y='언급량', x='폴라리티', max_lag=3)
print(&quot;✅ 프록시(언급량) vs 감성 리드/래그 상관&quot;)
print(corr_tbl)&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# df: ['월','언급량','폴라리티'] 정렬 완료 상태
import numpy as np, pandas as pd

# 1) 단순 회귀: Mentions_t ~ Polarity_{t-1}
df2 = df.copy()
df2['pol_lag1'] = df2['폴라리티'].shift(1)
m = df2.dropna()
X = (m['pol_lag1'] - m['pol_lag1'].mean())/m['pol_lag1'].std()
y = (m['언급량'] - m['언급량'].mean())/m['언급량'].std()
beta = np.dot(X, y) / np.dot(X, X)
r2 = np.corrcoef(X, y)[0,1]**2
print(f&quot;β(lag1): {beta:.3f}, R²: {r2:.3f}, N={len(m)}&quot;)

# 2) 퍼뮤테이션 테스트(유의성 감각)
rng = np.random.default_rng(42)
obs = np.corrcoef(X, y)[0,1]
cnt=0
for _ in range(5000):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;y_perm = rng.permutation(y)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if abs(np.corrcoef(X, y_perm)[0,1]) &amp;gt;= abs(obs):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cnt+=1
p = cnt/5000
print(f&quot;corr(lag1)={obs:.3f}, permutation p≈{p:.3f}&quot;)&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import matplotlib.pyplot as plt

m = df.copy()
m['pol_lag1'] = m['폴라리티'].shift(1)
m = m.dropna()

plt.figure(figsize=(6,5))
plt.scatter(m['pol_lag1'], m['언급량'], alpha=0.7)
z = np.polyfit(m['pol_lag1'], m['언급량'], 1); p = np.poly1d(z)
xv = np.linspace(m['pol_lag1'].min(), m['pol_lag1'].max(), 50)
plt.plot(xv, p(xv), lw=2)
plt.title(&quot;폴라리티(t-1) vs 언급량(t)&quot;)
plt.xlabel(&quot;Polarity (t-1)&quot;); plt.ylabel(&quot;Mentions (t)&quot;)
plt.tight_layout(); plt.show()&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;590&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qOxvu/btsQkYkqTF0/ukmBWZ3jJMAEQKT1AQyxzK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qOxvu/btsQkYkqTF0/ukmBWZ3jJMAEQKT1AQyxzK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qOxvu/btsQkYkqTF0/ukmBWZ3jJMAEQKT1AQyxzK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqOxvu%2FbtsQkYkqTF0%2FukmBWZ3jJMAEQKT1AQyxzK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;590&quot; height=&quot;490&quot; data-origin-width=&quot;590&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;관광 관련 긍·부정 감성 점유율과&lt;br&gt;언급량 데이터를 결합해, 선행성 여부를 검토했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그 결과, 감성 지수가 1개월 앞설 때&lt;br&gt;언급량과의 상관관계가 0.26 수준으로 양의 상관을 보였다.&lt;br&gt;&amp;nbsp;&lt;br&gt;긍정적 감성 변화가 이후&lt;br&gt;언급량 증가를 유도할 가능성을 보였다.&lt;br&gt;다만, 표본이 적어 통계적 유의성은 낮아&lt;br&gt;추가 데이터 확보가 필요하다.&lt;br&gt;&amp;nbsp;&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. 타깃 세그먼트 우선순위 도출&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns
from pathlib import Path
from config import DATA, ART
plt.rcParams['font.family'] = 'NanumGothic'

# 1) 데이터
visit_m = pd.read_parquet(DATA/'interim'/'visit_month.parquet')&amp;nbsp;&amp;nbsp;# [월, 성별, 연령별, 목적별, 방문객]

# 2) B2C 필터: 승무원/공용/상용 제외, 관광만
EXCLUDE = {'승무원','공용','상용'}
visit_m = visit_m[
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(~visit_m['성별'].isin(EXCLUDE)) &amp;amp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(~visit_m['연령별'].isin(EXCLUDE)) &amp;amp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(visit_m['목적별'] == '관광')
].copy()

# 3) 연도 집계 &amp;amp; 지표
visit_m['연도'] = pd.to_datetime(visit_m['월']).dt.year
g_year = (visit_m.groupby(['연도','성별','연령별','목적별'], as_index=False)['방문객'].sum())
tot24 = g_year.loc[g_year['연도']==2024, '방문객'].sum()

wide = (g_year.pivot(index=['성별','연령별','목적별'], columns='연도', values='방문객')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .rename(columns={2023:'y23', 2024:'y24'})
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .reset_index()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .fillna(0))
wide['점유율'] = np.where(tot24&amp;gt;0, wide['y24']/tot24, 0)
wide['YoY'] = np.where(wide['y23']&amp;gt;0, (wide['y24']-wide['y23'])/wide['y23'], np.nan)
wide['YoY'] = wide['YoY'].replace([np.inf,-np.inf], np.nan).fillna(0.0).clip(-0.2, 1.2)
wide['방문객_2024'] = wide['y24']

# 4) MAPE(2024, MA(3) 베이스라인)
def mape_2024_ma3(df):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = df.sort_values('월').copy()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df['pred'] = df['방문객'].rolling(3).mean().shift(1)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df['연도'] = pd.to_datetime(df['월']).dt.year
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;test = df[df['연도']==2024].dropna(subset=['pred'])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if test.empty:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return pd.Series({'MAPE_2024': np.nan})
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return pd.Series({'MAPE_2024': (np.abs((test['방문객']-test['pred'])/test['방문객'])).mean()})

mape_tbl = (visit_m.groupby(['성별','연령별','목적별']).apply(mape_2024_ma3).reset_index())
seg = wide.merge(mape_tbl, on=['성별','연령별','목적별'], how='left')
# MAPE 결측 보정(중앙값)
seg['MAPE_2024'] = seg['MAPE_2024'].fillna(seg['MAPE_2024'].median())

# 규모 하한
MIN_SHARE, MIN_CNT = 0.015, 200_000
seg = seg[(seg['점유율']&amp;gt;=MIN_SHARE) &amp;amp; (seg['방문객_2024']&amp;gt;=MIN_CNT)].copy()

# 5) 점수화(정규화) + 랭킹
def norm01(s):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s = s.astype(float)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if s.notna().sum()&amp;lt;=1 or s.max()==s.min():
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return pd.Series(np.zeros(len(s)), index=s.index)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return (s - s.min())/(s.max()-s.min())

seg['점유_n'] = norm01(seg['점유율'])
seg['YoY_n']&amp;nbsp;&amp;nbsp;= norm01(seg['YoY'])
seg['MAPE_n'] = norm01(seg['MAPE_2024'])&amp;nbsp;&amp;nbsp;# 클수록 나쁨
seg['우선순위점수'] = 0.6*seg['점유_n'] + 0.4*seg['YoY_n'] - 0.3*seg['MAPE_n']

top10 = (seg.sort_values('우선순위점수', ascending=False)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.head(10)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.assign(라벨=lambda d: d['성별']+' / '+d['연령별']+' / 관광'))

# 저장 + 플롯
out_csv = ART/'reports'/'week3_top_segments.csv'
Path(out_csv).parent.mkdir(parents=True, exist_ok=True)
top10.to_csv(out_csv, index=False)

plt.figure(figsize=(9,5))
sns.barplot(data=top10, y='라벨', x='우선순위점수', color='#2F80ED')
for i,r in top10.reset_index(drop=True).iterrows():
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;txt = f&quot;{r['점유율']*100:.1f}% | 24년 {int(r['방문객_2024']):,}명 | YoY {r['YoY']*100:.1f}%&quot;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.text(r['우선순위점수']+0.01, i, txt, va='center', fontsize=9)
plt.title('세그먼트 우선순위 Top 10')
plt.xlabel('우선순위점수'); plt.ylabel('')
plt.tight_layout()
plt.savefig(ART/'figures'/'week3_top_segments.png', dpi=150)
plt.show()


pick = topN[['성별','연령별','목적별','점유율','YoY','MAPE_2024','우선순위점수']].head(3)
print(&quot;\n  추천 타깃(3):\n&quot;, pick.to_string(index=False))&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;896&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dF94Ay/btsQkcDnCcs/mN0Kg5r6cnuXPE0PqHWSYk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dF94Ay/btsQkcDnCcs/mN0Kg5r6cnuXPE0PqHWSYk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dF94Ay/btsQkcDnCcs/mN0Kg5r6cnuXPE0PqHWSYk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdF94Ay%2FbtsQkcDnCcs%2FmN0Kg5r6cnuXPE0PqHWSYk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;896&quot; height=&quot;490&quot; data-origin-width=&quot;896&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;승무원·공용·상용 목적을 제외하고,&lt;br&gt;관광객 세그먼트만 대상으로&lt;br&gt;점유율·성장률·예측오차를 종합 점수화했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그 결과, 우선 순위 1위는 여성 21~30대 관광객이었으며,&lt;br&gt;2024년 기준 점유율과 성장률 모두 높은 집단으로 확인되었다.&lt;br&gt;&amp;nbsp;&lt;br&gt;→ &lt;u&gt;향후 마케팅 타깃 선정 확정&lt;/u&gt;&lt;br&gt;&amp;nbsp;&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;5주차 계획&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;차주에는 4주차 결과를 바탕으로 마케팅 SEO 작업을 위해 타깃별 우선 키워드를 선정하고&lt;br&gt;상위 노출과 조회수 예측 모델을 실행할 예정이다.&lt;br&gt;&lt;br&gt;키워드 마스터, 블랙키위에서 키워드의 검색량·경쟁도·콘텐츠 특성을 CSV로 뽑고, 특정 키워드 사용 시 예상 조회수와 상위 노출(Top 10 진입) 확률을 예측하는 것이다.&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/13</guid>
      <comments>https://pyj2qdat.tistory.com/13#entry13comment</comments>
      <pubDate>Fri, 5 Sep 2025 00:38:50 +0900</pubDate>
    </item>
    <item>
      <title>2. 데이터 분석가 프로젝트 3주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/12</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/B5Squ/btsP9GR7wyo/KYqQggNnOWtMxwlV20t6r0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/B5Squ/btsP9GR7wyo/KYqQggNnOWtMxwlV20t6r0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/B5Squ/btsP9GR7wyo/KYqQggNnOWtMxwlV20t6r0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FB5Squ%2FbtsP9GR7wyo%2FKYqQggNnOWtMxwlV20t6r0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;609&quot; height=&quot;609&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이번 주차는 한국관광공사 월별 상세 데이터(성별&amp;middot;연령&amp;middot;목적) 중심으로&lt;br /&gt;방한 외래관광객의 성별&amp;middot;연령&amp;middot;목적별 패턴을 파악하고,&lt;br /&gt;세그먼트별 규모&amp;times;성장률을 정량화해&lt;br /&gt;마케팅 타깃 우선순위를 정했다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h2 style=&quot;text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 3주차 수행일지]&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc; color: #333333; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;217&quot; data-end=&quot;361&quot;&gt;3주차 프로젝트 개요:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;217&quot; data-end=&quot;361&quot;&gt;방문객 데이터 정제(연-월&amp;rarr;연도), 성별/연령/목적 교차분석&lt;/li&gt;
&lt;li data-start=&quot;279&quot; data-end=&quot;319&quot;&gt;규모지수&amp;times;성장지수 기반 세그먼트 우선순위 점수 도출&lt;/li&gt;
&lt;li data-start=&quot;322&quot; data-end=&quot;361&quot;&gt;히트맵(성별&amp;times;목적, 연령&amp;times;목적)으로 집중 타깃 포인트 시각화&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;362&quot; data-end=&quot;528&quot;&gt;분석 결과:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;362&quot; data-end=&quot;528&quot;&gt;여성 21&amp;ndash;30세 관광이 최대 규모 &amp;amp; 높은 성장(Large &amp;amp; Fast)&lt;/li&gt;
&lt;li data-start=&quot;435&quot; data-end=&quot;488&quot;&gt;남성 61세 이상/20세 이하 관광이 고성장 니치(Niche &amp;amp; Fast)&lt;/li&gt;
&lt;li data-start=&quot;491&quot; data-end=&quot;528&quot;&gt;남성 관광/쇼핑 목적 전 연령대가 전년 대비 플러스 성장&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;529&quot; data-end=&quot;680&quot;&gt;마케팅 액션
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;529&quot; data-end=&quot;680&quot;&gt;메인: 여성 20&amp;ndash;40대 관광 페르소나 우선 공략&lt;/li&gt;
&lt;li data-start=&quot;587&quot; data-end=&quot;636&quot;&gt;보조: 남성 고관여(61+) &amp;amp; Z 세대(&amp;le;20) 니치 테스팅 패키지 운영&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style4&quot; /&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;진행과정&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. EDA&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 기본 라이브러리
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import matplotlib.font_manager as fm

#한글
!apt-get update -qq
!apt-get install fonts-nanum -qq

plt.rcParams['font.family'] = 'NanumGothic'
plt.rcParams['axes.unicode_minus'] = False

fm.fontManager.addfont('/usr/share/fonts/truetype/nanum/NanumGothic.ttf')
plt.rc('font', family='NanumGothic')&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 1. 국가별 외국인 방문 현황
df_visitors = dfs[&quot;국가별 외국인 방문 현황&quot;]

print(&quot;국가별 외국인 방문 현황 (상위 5행)&quot;)
display(df_visitors.head())

# 국가별 합계
country_sum = df_visitors.groupby(&quot;국가&quot;)[&quot;방문자 비율&quot;].sum().sort_values(ascending=False).head(10)

plt.figure(figsize=(10,6))
sns.barplot(x=country_sum.values, y=country_sum.index, palette=&quot;viridis&quot;, hue=country_sum.index, legend=False)
plt.title(&quot;상위 10개국 외국인 방문객수&quot;)
plt.xlabel(&quot;방문객 수&quot;)
plt.ylabel(&quot;국가&quot;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;888&quot; data-origin-height=&quot;544&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bEEV0a/btsP728Hzdw/wXydho2q3vyz5PpSHZhN5K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bEEV0a/btsP728Hzdw/wXydho2q3vyz5PpSHZhN5K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bEEV0a/btsP728Hzdw/wXydho2q3vyz5PpSHZhN5K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbEEV0a%2FbtsP728Hzdw%2FwXydho2q3vyz5PpSHZhN5K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;888&quot; height=&quot;544&quot; data-origin-width=&quot;888&quot; data-origin-height=&quot;544&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;-&amp;gt; &lt;u&gt;중국&amp;middot;일본 중심, 아시아권 비중 압도적&lt;/u&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. 방문객 데이터 개별 분석&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 데이터 로드 &amp;amp; 전처리
df_monthly = pd.read_csv(&quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/한국관광공사_방한 외래관광객 상세 월별 집계.csv&quot;, encoding='cp949')
df_monthly['기준연월'] = pd.to_datetime(df_monthly['기준연월'], errors='coerce')
df_monthly = df_monthly.dropna(subset=['기준연월']).copy()
df_monthly['연도'] = df_monthly['기준연월'].dt.year

print(&quot;✅ 방문객 데이터 로드 및 전처리 완료&quot;)
print(df_monthly.head())
print(&quot;\n데이터 정보:&quot;)
df_monthly.info()&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 연도별 총 방문객 수
df_visit_year = df_monthly.groupby(&quot;연도&quot;)[&quot;인원수&quot;].sum().reset_index()
plt.figure(figsize=(10, 6))
sns.lineplot(data=df_visit_year, x=&quot;연도&quot;, y=&quot;인원수&quot;, marker=&quot;o&quot;)
plt.title(&quot;연도별 총 방한 외래관광객 수 추이&quot;)
plt.xlabel(&quot;연도&quot;); plt.ylabel(&quot;방문객 수&quot;)
plt.xticks(sorted(df_visit_year[&quot;연도&quot;].unique()))
plt.grid(True); plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;844&quot; data-origin-height=&quot;544&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yj8Au/btsP9akQhIS/A2RWhbhFvss2RerUnuSo1k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yj8Au/btsP9akQhIS/A2RWhbhFvss2RerUnuSo1k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yj8Au/btsP9akQhIS/A2RWhbhFvss2RerUnuSo1k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fyj8Au%2FbtsP9akQhIS%2FA2RWhbhFvss2RerUnuSo1k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;844&quot; height=&quot;544&quot; data-origin-width=&quot;844&quot; data-origin-height=&quot;544&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; &lt;u&gt;한국을 찾는 외래관광객은 꾸준히 증가 중&lt;/u&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;br /&gt;3. 감성 데이터 개별 분석&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 1. 라이브러리 불러오기
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.font_manager as fm
import matplotlib.dates as mdates 

# 2. 데이터 불러오기 (감성 관련 데이터만)
df_sentiment = pd.read_csv(&quot;/content/drive/MyDrive/데이터분석/외래객방한데이터(한국관광공사)/20250813164536_한국 관광 관련 긍부정 점유율 추이.csv&quot;)
df_image = pd.read_csv(&quot;/content/drive/MyDrive/데이터분석/외래객방한데이터(한국관광공사)/20250813164348_방한 여행 이미지.csv&quot;)

print(&quot;✅ 감성 관련 데이터 로드 완료&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;Python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;Python&quot;&gt;&lt;code&gt;# 2. 긍부정 점유율 추이 분석
df_sentiment['기준년월'] = df_sentiment['기준년월'].astype(str)
df_sentiment['기준년월'] = pd.to_datetime(df_sentiment['기준년월'], format='%Y%m', errors='coerce')
df_sentiment = df_sentiment.dropna(subset=['기준년월']).copy()

df_sentiment_global = df_sentiment[df_sentiment['국가'] == '글로벌'].copy()

print(&quot;\nDebug: df_sentiment_global '기준년월' dtype after conversion:&quot;, df_sentiment_global['기준년월'].dtype)
print(&quot;Debug: df_sentiment_global '기준년월' unique values after conversion:&quot;, df_sentiment_global['기준년월'].unique())


# 시간 경과에 따른 긍정/부정 비율 추이 시각화
fig, ax = plt.subplots(figsize=(12, 6))
sns.lineplot(data=df_sentiment_global, x='기준년월', y='긍정', marker='o', label='긍정', ax=ax) 
sns.lineplot(data=df_sentiment_global, x='기준년월', y='부정', marker='o', label='부정', ax=ax) 
plt.title(&quot;한국 관광 관련 긍/부정 점유율 추이 (글로벌)&quot;)
plt.xlabel(&quot;기간&quot;)
plt.ylabel(&quot;비율 (%)&quot;)
plt.legend()
plt.grid(True)

ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1)) 
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) 
plt.xticks(rotation=45) 
plt.tight_layout()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1190&quot; data-origin-height=&quot;590&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b4MEbE/btsP9HizSJd/rffIthrFoRw02eKeXL3hu0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b4MEbE/btsP9HizSJd/rffIthrFoRw02eKeXL3hu0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b4MEbE/btsP9HizSJd/rffIthrFoRw02eKeXL3hu0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb4MEbE%2FbtsP9HizSJd%2FrffIthrFoRw02eKeXL3hu0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1190&quot; height=&quot;590&quot; data-origin-width=&quot;1190&quot; data-origin-height=&quot;590&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; &lt;u&gt;전반적으로 긍정이 높지만, 특정 시기에 부정 반응이 상승&lt;/u&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;br /&gt;4. 성별 &amp;amp; 연령별 관광객 분포&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 1. 세그먼트별 집계 (성별, 연령, 목적)

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 성별-연령대별 집계
segment_gender_age = df_monthly.groupby([&quot;성별&quot;, &quot;연령별&quot;])[&quot;인원수&quot;].sum().reset_index()

plt.figure(figsize=(10,6))
sns.barplot(data=segment_gender_age, x=&quot;연령별&quot;, y=&quot;인원수&quot;, hue=&quot;성별&quot;, palette=&quot;Set2&quot;)
plt.title(&quot;성별 &amp;amp; 연령별 관광객 분포&quot;)
plt.xticks(rotation=45)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;844&quot; data-origin-height=&quot;582&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cfqmAk/btsQaVG7pcJ/ux0HbbDQnpivOvIbBZCEVk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cfqmAk/btsQaVG7pcJ/ux0HbbDQnpivOvIbBZCEVk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cfqmAk/btsQaVG7pcJ/ux0HbbDQnpivOvIbBZCEVk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcfqmAk%2FbtsQaVG7pcJ%2Fux0HbbDQnpivOvIbBZCEVk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;844&quot; height=&quot;582&quot; data-origin-width=&quot;844&quot; data-origin-height=&quot;582&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;-&amp;gt; &lt;u&gt;여성&amp;middot;20~30대가 핵심 세그먼트&lt;/u&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 목적별 집계
segment_purpose = df_monthly.groupby(&quot;목적별&quot;)[&quot;인원수&quot;].sum().reset_index()

plt.figure(figsize=(8,6))
sns.barplot(data=segment_purpose, x=&quot;목적별&quot;, y=&quot;인원수&quot;, palette=&quot;viridis&quot;)
plt.title(&quot;방문 목적별 관광객 분포&quot;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;689&quot; data-origin-height=&quot;544&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/DjYba/btsP93TfZY1/yMrUMmBH7gjtrs6ZdmXOAk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/DjYba/btsP93TfZY1/yMrUMmBH7gjtrs6ZdmXOAk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/DjYba/btsP93TfZY1/yMrUMmBH7gjtrs6ZdmXOAk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDjYba%2FbtsP93TfZY1%2FyMrUMmBH7gjtrs6ZdmXOAk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;689&quot; height=&quot;544&quot; data-origin-width=&quot;689&quot; data-origin-height=&quot;544&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; &lt;u&gt;관광 목적이 압도적, 그 외(상용/유학)는 소수&lt;/u&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;br /&gt;5. 관광객 수요예측&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# Prophet 기반 연도별 관광객 수요 예측 (2025년)

from prophet import Prophet
import matplotlib.pyplot as plt

# 1. 연도별 관광객 집계
visit_by_year = df_visit.groupby(&quot;연도&quot;, as_index=False)[&quot;인원수&quot;].sum()

# 2. Prophet 입력용 데이터 변환 (연도 &amp;rarr; datetime)
df_prophet = visit_by_year.copy()
df_prophet[&quot;ds&quot;] = pd.to_datetime(df_prophet[&quot;연도&quot;].astype(str) + &quot;-01-01&quot;)&amp;nbsp;&amp;nbsp;
df_prophet[&quot;y&quot;] = df_prophet[&quot;인원수&quot;]
df_prophet = df_prophet[[&quot;ds&quot;, &quot;y&quot;]]

# 데이터 확인
print(df_prophet.tail())

# 3. Prophet 모델 생성 및 학습
model = Prophet(yearly_seasonality=True, daily_seasonality=False, weekly_seasonality=False)
model.fit(df_prophet)

# 4. 미래 데이터프레임 생성 (향후 3년 &amp;rarr; 2026, 2027, 2028 예측)
future = model.make_future_dataframe(periods=3, freq=&quot;Y&quot;)
forecast = model.predict(future)

# 5. 예측 결과 시각화
fig1 = model.plot(forecast)
plt.title(&quot;연도별 관광객 수요 예측 (Prophet, 2025 기준)&quot;)
plt.show()

# 6. 트렌드 및 계절성 분해 시각화
fig2 = model.plot_components(forecast)
plt.show()

# 7. 최신 예측 데이터 확인 (2023~2028년)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(8)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imagegridblock&quot;&gt;
  &lt;div class=&quot;image-container&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Z2x3x/btsQajVOyAX/p3pHOKw0TqdKRYaAZwFFL1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Z2x3x/btsQajVOyAX/p3pHOKw0TqdKRYaAZwFFL1/img.png&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;598&quot; style=&quot;width: 51.7181%;&quot; data-is-animation=&quot;false&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Z2x3x/btsQajVOyAX/p3pHOKw0TqdKRYaAZwFFL1/img.png&quot; alt=&quot;&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZ2x3x%2FbtsQajVOyAX%2Fp3pHOKw0TqdKRYaAZwFFL1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;989&quot; height=&quot;598&quot;/&gt;&lt;/span&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/4iv3z/btsQbB9jm6t/P7uJEX3j9C0yOpyhT6uE00/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/4iv3z/btsQbB9jm6t/P7uJEX3j9C0yOpyhT6uE00/img.png&quot; data-origin-width=&quot;889&quot; data-origin-height=&quot;590&quot; style=&quot;width: 47.1191%;&quot; data-is-animation=&quot;false&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/4iv3z/btsQbB9jm6t/P7uJEX3j9C0yOpyhT6uE00/img.png&quot; alt=&quot;&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F4iv3z%2FbtsQbB9jm6t%2FP7uJEX3j9C0yOpyhT6uE00%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;889&quot; height=&quot;590&quot;/&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; &lt;u&gt;2025~2026년에도 증가세 지속, 성장 기회 크다&lt;/u&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6. 세그먼트 우선순위 랭킹&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 세그먼트 우선순위 점수(랭킹) &amp;ndash; 23&amp;rarr;24 규모+성장 기반

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm


# 0) 데이터 로드
try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_visit
except NameError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_visit = pd.read_csv(&quot;/content/drive/MyDrive/데이터분석/외래객방한데이터(한국관광공사)/한국관광공사_방한 외래관광객 상세 월별 집계.csv&quot;, encoding=&quot;cp949&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_visit[&quot;기준연월&quot;] = pd.to_datetime(df_visit[&quot;기준연월&quot;], errors=&quot;coerce&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_visit[&quot;연도&quot;] = df_visit[&quot;기준연월&quot;].dt.year

# 1) 기본 전처리: 불필요 카테고리 정리(승무원 등), 결측 제거
dfv = df_visit.copy()
if &quot;성별&quot; in dfv.columns:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dfv = dfv[~dfv[&quot;성별&quot;].isin([&quot;승무원&quot;])]
if &quot;연령별&quot; in dfv.columns:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dfv = dfv[~dfv[&quot;연령별&quot;].isin([&quot;승무원&quot;, None, np.nan])]
dfv = dfv.dropna(subset=[&quot;연도&quot;,&quot;인원수&quot;])

# 2) 파라미터
BASE_YEAR&amp;nbsp;&amp;nbsp; = 2023
TARGET_YEAR = 2024
SEG_KEYS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;= [&quot;성별&quot;,&quot;연령별&quot;,&quot;목적별&quot;]
W_SIZE, W_GROWTH = 0.6, 0.4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # 점수 가중치(규모 60%, 성장 40%)

# 3) 연도별 집계
g = (dfv.groupby(SEG_KEYS+[&quot;연도&quot;])[&quot;인원수&quot;]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.sum()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.reset_index())

# 4) Wide 변환(연도별 칼럼)
wide = (g.pivot_table(index=SEG_KEYS, columns=&quot;연도&quot;, values=&quot;인원수&quot;, aggfunc=&quot;sum&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.fillna(0)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.reset_index())

# 5) 지표 계산
if BASE_YEAR not in wide.columns:&amp;nbsp;&amp;nbsp;wide[BASE_YEAR] = 0
if TARGET_YEAR not in wide.columns: wide[TARGET_YEAR] = 0

wide.rename(columns={
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BASE_YEAR: f&quot;{BASE_YEAR}인원&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TARGET_YEAR: f&quot;{TARGET_YEAR}인원&quot;
}, inplace=True)

# 성장률 계산 (분모 0인 경우 NaN 발생)
wide[&quot;YoY성장률&quot;] = (wide[f&quot;{TARGET_YEAR}인원&quot;] - wide[f&quot;{BASE_YEAR}인원&quot;]) / wide[f&quot;{BASE_YEAR}인원&quot;]

# 분모가 0인 경우 (즉, 2023년 인원이 0인 경우):
# - 2024년 인원도 0이면 성장률 0
# - 2024년 인원이 0보다 크면 성장률 1 (무한대 성장을 1로 간주)
wide[&quot;YoY성장률&quot;] = wide[&quot;YoY성장률&quot;].fillna(0) # 기본적으로 NaN을 0으로 채우고

# 2023년 0명 -&amp;gt; 2024년 &amp;gt; 0 명 된 경우를 1로 업데이트
wide[&quot;YoY성장률&quot;] = wide[&quot;YoY성장률&quot;].where(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;~((wide[f&quot;{BASE_YEAR}인원&quot;] == 0) &amp;amp; (wide[f&quot;{TARGET_YEAR}인원&quot;] &amp;gt; 0)),
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1.0
)


# 6) 정규화(0~1)
def minmax(s):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lo, hi = s.min(), s.max()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return (s - lo) / (hi - lo) if hi &amp;gt; lo else pd.Series(0.5, index=s.index)

wide[&quot;크기지수&quot;]&amp;nbsp;&amp;nbsp;= minmax(wide[f&quot;{TARGET_YEAR}인원&quot;])
wide[&quot;성장지수&quot;]&amp;nbsp;&amp;nbsp;= minmax(wide[&quot;YoY성장률&quot;])

# 7) 종합 점수
wide[&quot;점수&quot;] = W_SIZE*wide[&quot;크기지수&quot;] + W_GROWTH*wide[&quot;성장지수&quot;]

# 8) 분류 태그(해석용)
wide[&quot;분류&quot;] = np.select(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;[
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(wide[&quot;크기지수&quot;]&amp;gt;=0.5) &amp;amp; (wide[&quot;성장지수&quot;]&amp;gt;=0.5),
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(wide[&quot;크기지수&quot;]&amp;gt;=0.5) &amp;amp; (wide[&quot;성장지수&quot;]&amp;lt;0.5),
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(wide[&quot;크기지수&quot;]&amp;lt;0.5)&amp;nbsp;&amp;nbsp;&amp;amp; (wide[&quot;성장지수&quot;]&amp;gt;=0.5),
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;[&quot;Large &amp;amp; Fast&quot;,&quot;Large &amp;amp; Flat&quot;,&quot;Niche &amp;amp; Fast&quot;],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;default=&quot;Niche &amp;amp; Flat&quot;
)

# 9) 2024 점유율(비중)
total_2024 = wide[f&quot;{TARGET_YEAR}인원&quot;].sum()
wide[&quot;2024점유율(%)&quot;] = np.where(total_2024&amp;gt;0, wide[f&quot;{TARGET_YEAR}인원&quot;]/total_2024*100, 0)

# 10) 랭킹 정렬 및 출력
ranked = (wide
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.sort_values(&quot;점수&quot;, ascending=False)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.reset_index(drop=True))

print(&quot;✅ 세그먼트 우선순위 TOP 15&quot;)
display(ranked[SEG_KEYS+[f&quot;{BASE_YEAR}인원&quot;,f&quot;{TARGET_YEAR}인원&quot;,&quot;YoY성장률&quot;,&quot;크기지수&quot;,&quot;성장지수&quot;,&quot;점수&quot;,&quot;분류&quot;,&quot;2024점유율(%)&quot;]].head(15))

# 10-1) 점유율 기준 랭킹 테이블 (추가)
ranked_share = (wide
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.sort_values(&quot;2024점유율(%)&quot;, ascending=False)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.reset_index(drop=True))
print(&quot;✅ 세그먼트 점유율 기준 TOP 15&quot;)
display(ranked_share[SEG_KEYS + [f&quot;{TARGET_YEAR}인원&quot;,&quot;2024점유율(%)&quot;,&quot;YoY성장률&quot;,&quot;점수&quot;,&quot;분류&quot;]].head(15))
# 11) 시각화(Top 10, 정렬: 2024 점유율)
topN = 10
top_plot = ranked_share.head(topN).copy()
labels = top_plot.apply(lambda r: &quot; / &quot;.join([str(r[k]) for k in SEG_KEYS]), axis=1)
plt.figure(figsize=(10,6))
plt.barh(y=labels, width=top_plot[&quot;2024점유율(%)&quot;])
plt.gca().invert_yaxis()
plt.title(f&quot;세그먼트 점유율 Top {topN} (정렬: 2024 점유율)&quot;)
plt.xlabel(&quot;2024 점유율(%)&quot;)
# 막대 옆 보조 라벨: 점유율/24년 인원/YoY
for i, (p, cnt, yoy) in enumerate(zip(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top_plot[&quot;2024점유율(%)&quot;],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top_plot[f&quot;{TARGET_YEAR}인원&quot;],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top_plot[&quot;YoY성장률&quot;]
)):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.text(p + 0.3, i, f&quot;{p:.1f}% | 24년 {cnt:,.0f}명 | YoY {yoy*100:.1f}%&quot;, va=&quot;center&quot;)
plt.tight_layout()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;995&quot; data-origin-height=&quot;590&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Vrgdf/btsP75dhqAz/5UuK5cB16Esal1PByDOgG0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Vrgdf/btsP75dhqAz/5UuK5cB16Esal1PByDOgG0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Vrgdf/btsP75dhqAz/5UuK5cB16Esal1PByDOgG0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FVrgdf%2FbtsP75dhqAz%2F5UuK5cB16Esal1PByDOgG0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;995&quot; height=&quot;590&quot; data-origin-width=&quot;995&quot; data-origin-height=&quot;590&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;-&amp;gt; &lt;u&gt;여성 20~30대 관광 목적 방문객이 핵심 타겟&lt;/u&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3주차 결과:&amp;nbsp;&lt;/h3&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-pm-slice=&quot;1 1 []&quot; data-ke-size=&quot;size16&quot;&gt;발견 단계&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;여성 20~30대 관광객이 가장 큰 규모이면서 빠른 성장세를 보인다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;남성 61세 이상, 20세 이하 관광객은 소규모지만 빠르게 성장 중이다.&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4주차 계획:&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;검색량 데이터 적재 &amp;rarr; 표준화 저장&lt;/li&gt;
&lt;li&gt;상위 세그먼트 MA(3) 베이스라인 백테스트&lt;/li&gt;
&lt;li&gt;예산 시뮬레이터 테이블/그래프 생성&lt;/li&gt;
&lt;li&gt;대시보드 스켈레톤 파일 뼈대 만들기&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/12</guid>
      <comments>https://pyj2qdat.tistory.com/12#entry12comment</comments>
      <pubDate>Tue, 26 Aug 2025 18:46:09 +0900</pubDate>
    </item>
    <item>
      <title>1. 데이터 분석가 프로젝트 1~2주차 수행일지</title>
      <link>https://pyj2qdat.tistory.com/11</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;1~2주차-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eAdPD7/btsQcb384dN/cl5OooRZAjLqlQ90fcRqE1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eAdPD7/btsQcb384dN/cl5OooRZAjLqlQ90fcRqE1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eAdPD7/btsQcb384dN/cl5OooRZAjLqlQ90fcRqE1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeAdPD7%2FbtsQcb384dN%2Fcl5OooRZAjLqlQ90fcRqE1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;462&quot; height=&quot;462&quot; data-filename=&quot;1~2주차-001.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666;&quot;&gt;프로젝트 주제는 관광 데이터 기반 외국인 관광객 특성별 패턴 분석이다.&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;이 주제를 선택한 이유는&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;거시적(국가 정책) &amp;rarr; 중간(기업/시장) &amp;rarr; 미시적(고객 경험) 측면에서 볼 때 관광마케팅은 필요하다.&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;마케팅 실무 + 정부기관 프로젝트 모두에 관심이 있는데, 이 주제는 두 영역을 동시에 포함할 수 있어서 선택했다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size14&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #666666;&quot;&gt;거시적&amp;nbsp;관점:&amp;nbsp;국가&amp;nbsp;정책&amp;nbsp;&amp;amp;&amp;nbsp;글로벌&amp;nbsp;트렌드 &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;최근&amp;nbsp;정부와&amp;nbsp;지자체는&amp;nbsp;외국인&amp;nbsp;관광객&amp;nbsp;유치를&amp;nbsp;국가적&amp;nbsp;과제로&amp;nbsp;삼고&amp;nbsp;있다. &lt;/span&gt;&lt;br /&gt;&lt;a href=&quot;https://news.nate.com/view/20250817n04131&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;문화체육관광부&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt;와 &lt;/span&gt;&lt;a href=&quot;https://news.nate.com/view/20250716n04870&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;한국관광공사&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt;는 외래관광객 유치를 위해 도약을 준비 중이고,&lt;/span&gt;&lt;br /&gt;&lt;a href=&quot;https://news.nate.com/view/20250818n04880&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;서울&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt; / &lt;/span&gt;&lt;a href=&quot;https://news.nate.com/view/20250319n21235&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;부산&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt;&amp;nbsp;/ &lt;/span&gt;&lt;a href=&quot;https://news.nate.com/view/20250819n07870&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;제주&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt;/ &lt;/span&gt;&lt;a href=&quot;https://news.nate.com/view/20250218n11635&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;전주&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt; /&amp;nbsp;&lt;/span&gt;&lt;a href=&quot;https://news.nate.com/view/20250814n12021&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;경북&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style=&quot;color: #666666;&quot;&gt; 지자체는 &amp;ldquo;세분화된 관광객 타겟팅 + 맞춤형 캠페인&amp;rdquo; 전략을 추진 중이다. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;관광 산업은 단순한 여행 서비스가 아니라 국가 이미지, 지역 경제, 고용 창출까지 직결되는 산업으로 보인다.&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;따라서 데이터 기반 관광 마케팅 전략은 국가 정책과 산업 트렌드를 뒷받침할 수 있는 중요한 인사이트라고 생각했다.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size14&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #666666;&quot;&gt;중간&amp;nbsp;관점:&amp;nbsp;시장&amp;nbsp;&amp;amp;&amp;nbsp;기업&amp;nbsp;마케팅 &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;거시적으로는 국가 정책이지만, 현장에서 이걸 실행하는 주체는 기업/에이전시 이다. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;특히 마케팅 에이전시는&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;데이터 기반 의사결정: &amp;ldquo;어떤 국가에서 어떤 세그먼트가 긍정 경험을 하고, 어떤 부분에서 불편을 느끼는가?&amp;rdquo; &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;캠페인 전략 제안: 단순 홍보가 아니라 세그먼트별 맞춤 메시지를 설계해야 한다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;성과 검증: A/B 테스트, ROI 분석 같은 실무적 방법론이 요구된다. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;그래서 최종 프로젝트는 실제 마케팅 실무에 바로 적용 가능한 전략 프레임워크를 만드는 것이 목표이다.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size14&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #666666;&quot;&gt;미시적 관점: 고객 &amp;amp; 경험&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;마케팅은 결국 기업/서비스/상품을 사람과 연결하는 하나의 다리이다.&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;최종 프로젝트를 통해 누가 한국을 찾고 어떤 경험이 긍정/부정으로 이어졌는지&lt;/span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666;&quot;&gt;이를 바탕으로 세그먼트별 페르소나와 마케팅 메시지를 구체화 할 예정이다.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br /&gt;방문객 수 + 감성 기반 마케팅 인사이트 도출하기&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;[멀티캠퍼스 KDT 데이터 분석가 최종 프로젝트 1~2주차 수행일지]&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;div id=&quot;cell-vRjIA8useewc&quot; style=&quot;color: #e3e3e3; text-align: start;&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div style=&quot;color: #e3e3e3;&quot;&gt;
&lt;div&gt;
&lt;h3 style=&quot;color: #e3e3e3;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;1. 프로젝트 개요&lt;/span&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;ul style=&quot;list-style-type: disc; color: #e3e3e3;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;메인 주제: 관광 데이터 기반 외국인 관광객 특성별 패턴분석 및 세분화&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;서브 주제:&lt;/span&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;비용-효과 기반 마케팅 믹스 모델링(MMM)&lt;/span&gt;&lt;/li&gt;
&lt;li style=&quot;color: #e3e3e3;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;외국인 관광객 맞춤형 A/B 테스트 전략 시뮬레이션&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;최종 목표: Top 5 국가별 관광객 행동 데이터를 기반으로, ROI가 높은 채널을 찾아내어 마케팅 인사이트 발굴&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&quot;cell-NUZ9P_vxeqeM&quot; style=&quot;color: #e3e3e3; text-align: start;&quot;&gt;
&lt;div style=&quot;color: #e3e3e3;&quot;&gt;
&lt;div&gt;
&lt;h3 style=&quot;color: #e3e3e3;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;2. 데이터셋 확보 현황&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;ul style=&quot;list-style-type: disc; color: #e3e3e3;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;관광청/관광공사 제공 13개 CSV (성별&amp;middot;연령별, 목적별, 지역별 방문, 만족도, 긍부정 추이 등)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;리뷰 데이터 (외부 논문/공공데이터)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;marketing_AB.csv &amp;rarr; 국가별/채널별 A/B 테스트 결과&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Advertising.csv &amp;rarr; 채널별 광고 투자액&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;market-mix-modeling-using-sales-data.ipynb &amp;rarr; MMM 예제 코드 확장&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://drive.google.com/drive/folders/1cB0eb4wAd8uc1aZdW4_KPsXrQ74ZW8-C?usp=drive_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;데이터셋&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&quot;cell-paP0hZF0ewnh&quot; style=&quot;color: #e3e3e3; text-align: start;&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div style=&quot;color: #e3e3e3;&quot;&gt;
&lt;div&gt;
&lt;h3 style=&quot;color: #e3e3e3;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;3. 분석 진행 상황&lt;/span&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&quot;cell-Y8L6pfhW99Hj&quot; style=&quot;color: #e3e3e3; text-align: start;&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div style=&quot;color: #e3e3e3;&quot;&gt;
&lt;div&gt;
&lt;h4 style=&quot;color: #e3e3e3;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;(0) 데이터 전처리&lt;/span&gt;&lt;/h4&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;import pandas as pd
csv_files = {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;국가별 외국인 방문 현황&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813163808_국가별 외국인 방문 현황 CSV 다운로드.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한여행 요약(국적별)&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164248_방한여행 요약(국적별).csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한여행 요약(대륙별)&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164248_방한여행 요약(대륙별).csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한 외래관광객 특성(교통수단별)&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164323_방한 외래관광객 특성(교통수단별).csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한 외래관광객 특성(성&amp;middot;연령별)&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164323_방한 외래관광객 특성(성&amp;middot;연령별).csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한 외래관광객 특성(목적별)&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164323_방한 외래관광객 특성(목적별).csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한여행 행태 및 만족도 평가&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164343_방한여행 행태 및 만족도 평가.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;방한 여행 이미지&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164348_방한 여행 이미지.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;관광객 지역별 방문비율&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164557_관광객 지역별 방문비율 CSV 다운로드.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;긍부정 점유율 추이&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164536_한국 관광 관련 긍부정 점유율 추이.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;국가별 관광 포지셔닝 맵&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164533_국가별 한국 관광 관련 언급 포지셔닝 맵.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;국가별 언급량&amp;middot;노출량&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164525_한국 관광 관련 국가별 언급량 인게이지먼트 잠재적 노출량 합산표.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;관광 언급량 추이&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164522_한국 관광 관련 언급량 인게이지먼트 추이 언급량.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;offerings&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/offerings.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;marketing_AB&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/marketing_AB.csv&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;Advertising&quot;: &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/Advertising.csv&quot;
}

headers_preview = {}
for name, path in csv_files.items():
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv(path, nrows=5)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;headers_preview[name] = df.head()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;except Exception as e:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;headers_preview[name] = f&quot;Error: {e}&quot;

headers_preview&lt;/code&gt;&lt;/pre&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;결과:&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageslideblock alignCenter&quot; data-image=&quot;[{&amp;quot;src&amp;quot;:&amp;quot;https://blog.kakaocdn.net/dn/eL3bdd/btsPVlAnaFF/CqYrX2odM8efkWFneVwOi1/img.png&amp;quot;},{&amp;quot;src&amp;quot;:&amp;quot;https://blog.kakaocdn.net/dn/bMJP5j/btsPUVorZBT/KHr4WMsCIg5ooKKjuqhqjK/img.png&amp;quot;},{&amp;quot;src&amp;quot;:&amp;quot;https://blog.kakaocdn.net/dn/BUGGW/btsPWpvAio4/wrTWR3Nmo5IP70BAZW8BGk/img.png&amp;quot;}]&quot;&gt;
  &lt;div class=&quot;image-container&quot;&gt;&lt;span class=&quot;image-wrap selected&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eL3bdd/btsPVlAnaFF/CqYrX2odM8efkWFneVwOi1/img.png&quot; data-url=&quot;https://blog.kakaocdn.net/dn/eL3bdd/btsPVlAnaFF/CqYrX2odM8efkWFneVwOi1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eL3bdd/btsPVlAnaFF/CqYrX2odM8efkWFneVwOi1/img.png&quot; loading=&quot;lazy&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeL3bdd%2FbtsPVlAnaFF%2FCqYrX2odM8efkWFneVwOi1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;741&quot;/&gt;&lt;/span&gt;&lt;span class=&quot;image-wrap &quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bMJP5j/btsPUVorZBT/KHr4WMsCIg5ooKKjuqhqjK/img.png&quot; data-url=&quot;https://blog.kakaocdn.net/dn/bMJP5j/btsPUVorZBT/KHr4WMsCIg5ooKKjuqhqjK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bMJP5j/btsPUVorZBT/KHr4WMsCIg5ooKKjuqhqjK/img.png&quot; loading=&quot;lazy&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbMJP5j%2FbtsPUVorZBT%2FKHr4WMsCIg5ooKKjuqhqjK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;741&quot;/&gt;&lt;/span&gt;&lt;span class=&quot;image-wrap &quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BUGGW/btsPWpvAio4/wrTWR3Nmo5IP70BAZW8BGk/img.png&quot; data-url=&quot;https://blog.kakaocdn.net/dn/BUGGW/btsPWpvAio4/wrTWR3Nmo5IP70BAZW8BGk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BUGGW/btsPWpvAio4/wrTWR3Nmo5IP70BAZW8BGk/img.png&quot; loading=&quot;lazy&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBUGGW%2FbtsPWpvAio4%2FwrTWR3Nmo5IP70BAZW8BGk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;741&quot;/&gt;&lt;/span&gt;&lt;button class=&quot;btn btn-prev&quot;&gt;&lt;span class=&quot;ico-prev&quot;&gt;이전&lt;/span&gt;&lt;/button&gt;&lt;button class=&quot;btn btn-next&quot;&gt;&lt;span class=&quot;ico-next&quot;&gt;다음&lt;/span&gt;&lt;/button&gt;&lt;/div&gt;
  &lt;div class=&quot;mark&quot;&gt;&lt;span data-index=&quot;0&quot;&gt;0&lt;/span&gt;&lt;span data-index=&quot;1&quot;&gt;1&lt;/span&gt;&lt;span data-index=&quot;2&quot;&gt;2&lt;/span&gt;&lt;/div&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;div style=&quot;background-color: #ffffff; color: #1f1f1f; text-align: start;&quot;&gt;
&lt;h4 style=&quot;color: #1f1f1f;&quot; data-ke-size=&quot;size20&quot;&gt;(1) 탐색적 데이터 분석 (EDA)&lt;/h4&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;목적: 관광객 기본 특성(연령&amp;middot;성별&amp;middot;방문 목적 등) 및 소셜 언급량 분석&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;# 폰트설치
!sudo apt-get update -qq
!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm -rf ~/.cache/matplotlib&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;# 한글 제목 깨짐 해결하기
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
font_path = '/usr/share/fonts/truetype/nanum/NanumGothic.ttf'
font_name = fm.FontProperties(fname=font_path).get_name()
plt.rc('font', family=font_name)
plt.rcParams['axes.unicode_minus'] = False
print(f&quot;Matplotlib font set to: {plt.rcParams['font.family']}&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;import pandas as pd
import matplotlib.pyplot as plt

try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv(&quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164323_방한 외래관광객 특성(성&amp;middot;연령별).csv&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# '남성'과 '여성' 열을 더하여 총 방문객 수를 계산하고 시각화
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df[&quot;총 방문객수&quot;] = df[&quot;남성&quot;] + df[&quot;여성&quot;]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df.groupby(&quot;연령대&quot;)[&quot;총 방문객수&quot;].sum().plot(kind=&quot;bar&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.title(&quot;Visitor Distribution by Age Group&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.show()
except FileNotFoundError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Error: The file '20250813164323_방한 외래관광객 특성(성&amp;middot;연령별).csv' was not found.&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Please ensure the file is in the correct directory or provide the full path.&quot;)
except KeyError as e:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(f&quot;KeyError: {e}. Please check the column names in your CSV file.&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;534&quot; data-origin-height=&quot;498&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/unJwA/btsPWfGI4lW/vPnPMqcsq7Pkv6TGL1zkOK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/unJwA/btsPWfGI4lW/vPnPMqcsq7Pkv6TGL1zkOK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/unJwA/btsPWfGI4lW/vPnPMqcsq7Pkv6TGL1zkOK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FunJwA%2FbtsPWfGI4lW%2FvPnPMqcsq7Pkv6TGL1zkOK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;534&quot; height=&quot;498&quot; data-origin-width=&quot;534&quot; data-origin-height=&quot;498&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff;&quot;&gt;&lt;span style=&quot;color: #1f1f1f;&quot;&gt;인사이트: 20~30대 방문객 비중이 가장 높음 &amp;rarr; 마케팅 타겟 우선순위 확보&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;div style=&quot;background-color: #ffffff; color: #1f1f1f; text-align: start;&quot;&gt;
&lt;h4 style=&quot;color: #1f1f1f;&quot; data-ke-size=&quot;size20&quot;&gt;(2) 리뷰 기반 예측 모델링&lt;/h4&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;목적: 리뷰/만족도 데이터 기반으로 관광객 재방문율 예측 모델 구축&lt;/p&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;import pandas as pd
# CSV 불러오기
file_path = &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164343_방한여행 행태 및 만족도 평가.csv&quot;
df = pd.read_csv(file_path)

# 데이터 확인 (앞부분 5행, 컬럼명, 기본 통계)
df_head = df.head()
df_columns = df.columns.tolist()
df_info = df.describe(include=&quot;all&quot;)

df_head, df_columns[:20], len(df_columns)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;739&quot; data-origin-height=&quot;386&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rm5yJ/btsPW2mw4e3/Lq6hy0FqPcuaPWufQd3YD1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rm5yJ/btsPW2mw4e3/Lq6hy0FqPcuaPWufQd3YD1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rm5yJ/btsPW2mw4e3/Lq6hy0FqPcuaPWufQd3YD1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Frm5yJ%2FbtsPW2mw4e3%2FLq6hy0FqPcuaPWufQd3YD1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;739&quot; height=&quot;386&quot; data-origin-width=&quot;739&quot; data-origin-height=&quot;386&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# 데이터 불러오기
file_path = &quot;/content/drive/My Drive/데이터분석/외래객방한데이터(한국관광공사)/20250813164343_방한여행 행태 및 만족도 평가.csv&quot;
df = pd.read_csv(file_path)

# Feature / Target 설정
X = df[[&quot;체재 기간(일)&quot;, &quot;1인 평균 지출 경비(USS)&quot;, &quot;1일 평균 지출 경비(USS)&quot;,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;전반적 만족도(긍정 응답 비율)&quot;, &quot;타인 추천 의향(긍정 응답 비율)&quot;]]

# target: 재방문 의향 &amp;rarr; 85% 이상 긍정(1), 미만은 부정(0)
y = (df[&quot;관광목적 재방문 의향(긍정 응답 비율)&quot;] &amp;gt;= 85).astype(int)

# train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 모델 학습
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# 예측
y_pred = model.predict(X_test)

# 평가
print(classification_report(y_test, y_pred))

# 교차 검증
scores = cross_val_score(model, X, y, cv=5)
print(&quot;교차검증 Accuracy:&quot;, scores)
print(&quot;평균 Accuracy:&quot;, scores.mean())

# 변수 중요도 시각화
importances = model.feature_importances_
feat_imp = pd.Series(importances, index=X.columns).sort_values(ascending=False)

plt.figure(figsize=(8,5))
sns.barplot(x=feat_imp, y=feat_imp.index)
plt.title(&quot;Feature Importance (재방문 의향 예측)&quot;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1271&quot; data-origin-height=&quot;260&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bhUHHo/btsPVZcZlut/vxZVICMuA8REe8b3ZFeab1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bhUHHo/btsPVZcZlut/vxZVICMuA8REe8b3ZFeab1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bhUHHo/btsPVZcZlut/vxZVICMuA8REe8b3ZFeab1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbhUHHo%2FbtsPVZcZlut%2FvxZVICMuA8REe8b3ZFeab1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1271&quot; height=&quot;260&quot; data-origin-width=&quot;1271&quot; data-origin-height=&quot;260&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;849&quot; data-origin-height=&quot;468&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xELUW/btsPVwvmRnl/z4ui7z4LRCEmmigZ5zrUDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xELUW/btsPVwvmRnl/z4ui7z4LRCEmmigZ5zrUDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xELUW/btsPVwvmRnl/z4ui7z4LRCEmmigZ5zrUDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxELUW%2FbtsPVwvmRnl%2Fz4ui7z4LRCEmmigZ5zrUDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;849&quot; height=&quot;468&quot; data-origin-width=&quot;849&quot; data-origin-height=&quot;468&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;div style=&quot;background-color: #ffffff; color: #1f1f1f; text-align: start;&quot;&gt;
&lt;h4 style=&quot;color: #1f1f1f;&quot; data-ke-size=&quot;size20&quot;&gt;(3) XGBoost + SHAP 기반 보완 모델&lt;/h4&gt;
&lt;/div&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SMOTE를 활용해 클래스 불균형 보정&lt;/li&gt;
&lt;li&gt;XGBoost로 재방문율 예측&lt;/li&gt;
&lt;li&gt;SHAP을 통해 주요 영향 요인 해석&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&quot;bash&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report
from imblearn.over_sampling import SMOTE
import shap
import pandas as pd 

# -----------------------------
# 1. Train / Test Split
# -----------------------------
# stratify=y를 사용하여 타겟 변수의 분포를 고려하여 분할
X_train, X_test, y_train, y_test = train_test_split(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X, y, test_size=0.2, stratify=y, random_state=42
)

# -----------------------------
# 2. SMOTE로 클래스 불균형 보정
# -----------------------------
# 소수 클래스 샘플 수에 맞춰 k_neighbors 값을 줄임
# 현재 데이터에서는 소수 클래스 샘플이 3개이므로 k_neighbors를 2로 설정
sm = SMOTE(random_state=42, k_neighbors=2)
X_train_res, y_train_res = sm.fit_resample(X_train, y_train)

# -----------------------------
# 3. XGBoost 모델 학습
# -----------------------------
xgb_model = xgb.XGBClassifier(
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;n_estimators=300,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;learning_rate=0.05,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;max_depth=6,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;subsample=0.8,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;colsample_bytree=0.8,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;random_state=42,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;eval_metric=&quot;logloss&quot;
)

xgb_model.fit(X_train_res, y_train_res)

# -----------------------------
# 4. 성능 평가
# -----------------------------
y_pred = xgb_model.predict(X_test)
y_prob = xgb_model.predict_proba(X_test)[:,1]

print(&quot;Accuracy:&quot;, accuracy_score(y_test, y_pred))
print(&quot;ROC-AUC:&quot;, roc_auc_score(y_test, y_prob))
print(classification_report(y_test, y_pred))

# -----------------------------
# 5. SHAP 값 계산 (모델 설명)
# -----------------------------
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

# Summary Plot (전체 Feature 중요도)
shap.summary_plot(shap_values, X_test)

# Force Plot (개별 예측 이유 시각화 - 첫 번째 샘플 예시)
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0,:], X_test.iloc[0,:])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1834&quot; data-origin-height=&quot;514&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rnddQ/btsPYL5oC9x/HRI5kfeUY6CW6efPkJJB90/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rnddQ/btsPYL5oC9x/HRI5kfeUY6CW6efPkJJB90/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rnddQ/btsPYL5oC9x/HRI5kfeUY6CW6efPkJJB90/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrnddQ%2FbtsPYL5oC9x%2FHRI5kfeUY6CW6efPkJJB90%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1834&quot; height=&quot;514&quot; data-origin-width=&quot;1834&quot; data-origin-height=&quot;514&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style4&quot; /&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;프로젝트 2주차 예정&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2주차: 심화 분석 &amp;amp; 패턴 발견&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표: 시장&amp;middot;고객 특성에 따라 의미 있는 패턴 찾기&lt;/li&gt;
&lt;li&gt;프로젝트 진행 순서
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;연도별 &amp;amp; 카테고리별(성별, 연령, 교통수단, 목적) 세분화 분석&lt;/li&gt;
&lt;li&gt;방문객 트렌드와 감성 변화를 동시에 시각화&lt;/li&gt;
&lt;li&gt;상관분석 &amp;amp; 시계열 트렌드 분석으로 &amp;ldquo;시기별 감성-방문객 연결고리&amp;rdquo; 도출&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;마케팅 전략에 참고할 고객 세그먼트별 특성 인사이트&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>마케터 관점의 데이터분석/데이터분석 프로젝트</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/11</guid>
      <comments>https://pyj2qdat.tistory.com/11#entry11comment</comments>
      <pubDate>Tue, 19 Aug 2025 12:54:57 +0900</pubDate>
    </item>
    <item>
      <title>9. 여행 리뷰로 보는&amp;nbsp;방문&amp;middot;재방문율&amp;nbsp;감정 분석</title>
      <link>https://pyj2qdat.tistory.com/10</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;제목을 입력해주세요_-002.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/OyqbD/btsPViKueM9/sYvlt4dM9pKZMNtKIJMpcK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/OyqbD/btsPViKueM9/sYvlt4dM9pKZMNtKIJMpcK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/OyqbD/btsPViKueM9/sYvlt4dM9pKZMNtKIJMpcK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FOyqbD%2FbtsPViKueM9%2FsYvlt4dM9pKZMNtKIJMpcK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;505&quot; height=&quot;505&quot; data-filename=&quot;제목을 입력해주세요_-002.png&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터분석가 부트캠프 수업이&lt;br /&gt;시작된 지 얼마 되지 않은 것 같은데,&lt;br /&gt;벌써 첫 프로젝트에 들어갔다...&lt;br /&gt;&amp;nbsp;&lt;br /&gt;주제는 자유였지만,&lt;br /&gt;막상 시작하려니&amp;nbsp;&lt;br /&gt;&amp;ldquo;무엇을 분석해야 재밌고, 실무에도 연결될까?&amp;rdquo;라는 고민이 컸다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;그리고 대학원 수업과 병행해야 하다 보니&lt;br /&gt;어쩔 수 없이 조별 없이&lt;br /&gt;나 혼자서 프로젝트를 진행해야 하기 때문에&lt;br /&gt;해야 할 일이 산더미 같다...&lt;br /&gt;&amp;nbsp;&lt;br /&gt;그래도 꾸준하게 주제를 탐색한 끝에&lt;br /&gt;여행 관광 리뷰를 통해&lt;br /&gt;외국인 관광객 유치에&lt;br /&gt;도움을 줄 수 있는 주제로 선택하게 됐다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;데이터 생성 &amp;rarr; 전처리 &amp;rarr; 의도 라벨링 &amp;rarr; 감정 점수 부여 &amp;rarr; 시각화까지,&lt;br /&gt;짧지만 실무 아이디어로 확장 가능한 과정을 간략하게 정리했다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;아직 프로젝트 초반이라&lt;br /&gt;완성도는 높지는 않지만&lt;br /&gt;초반 틀만 다져보자 하는 마음으로 적어봤다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eBYlKB/btsPM3G0Lbk/yR5yPcowwk0CgGo3BalbyK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eBYlKB/btsPM3G0Lbk/yR5yPcowwk0CgGo3BalbyK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eBYlKB/btsPM3G0Lbk/yR5yPcowwk0CgGo3BalbyK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeBYlKB%2FbtsPM3G0Lbk%2FyR5yPcowwk0CgGo3BalbyK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;679&quot; height=&quot;679&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;데이터 &amp;amp; 전처리&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;HTML, URL, 공백 제거 &amp;rarr; clean_text 생성&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd
import re

# 예시 데이터 (실데이터로 교체 가능)
data = [
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': 'I want to visit Korea next year!', 'lang': 'en', 'created_at': '2025-05-10'},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': 'I will come back again!',&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'lang': 'en', 'created_at': '2025-06-15'},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': '想去韩国旅游',&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'lang': 'zh', 'created_at': '2025-05-20'},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': '还会去韩国',&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'lang': 'zh', 'created_at': '2025-06-18'},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': '가보고 싶다',&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'lang': 'ko', 'created_at': '2025-05-25'},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{'text': '또 가고 싶다',&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'lang': 'ko', 'created_at': '2025-06-10'}
]
df = pd.DataFrame(data)
df['created_at'] = pd.to_datetime(df['created_at'])

# 전처리: 태그/URL/여백 제거
def clean_text(s: str) -&amp;gt; str:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if s is None: return &quot;&quot;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s = re.sub(r'&amp;lt;[^&amp;gt;]+&amp;gt;', ' ', s)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # 태그
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s = re.sub(r'https?://\S+|www\.\S+', ' ', s)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # URL
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s = re.sub(r'\s+', ' ', s).strip()&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # 여백
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return s

df['clean_text'] = df['text'].apply(clean_text)
df.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bvkY7T/btsPPMcTd6D/MhZK5FGgyn2P7RY59Tqryk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bvkY7T/btsPPMcTd6D/MhZK5FGgyn2P7RY59Tqryk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bvkY7T/btsPPMcTd6D/MhZK5FGgyn2P7RY59Tqryk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbvkY7T%2FbtsPPMcTd6D%2FMhZK5FGgyn2P7RY59Tqryk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;400&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;방문/재방문 의도 라벨링&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;언어별 키워드로 간단히 의도 라벨(visit_intent, revisit_intent)을 붙인다.&lt;br /&gt;실제 프로젝트에선 키워드 사전을 더 늘리거나 ML 분류 모델로 대체 가능하다.&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd

# 언어별 키워드 규칙 (예시)
keywords = {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'en': {'visit': ['visit'],&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'revisit': ['come back']},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'zh': {'visit': ['想去'],&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'revisit': ['还会去']},
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'ko': {'visit': ['가보고 싶다'], 'revisit': ['또 가고 싶다']}
}

def intent_label(row):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lang = row['lang']; t = row['clean_text']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;visit&amp;nbsp;&amp;nbsp; = any(term in t for term in keywords.get(lang, {}).get('visit', []))
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;revisit = any(term in t for term in keywords.get(lang, {}).get('revisit', []))
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return pd.Series({'visit_intent': int(visit), 'revisit_intent': int(revisit)})

df = pd.concat([df, df.apply(intent_label, axis=1)], axis=1)
df[['text','lang','visit_intent','revisit_intent']]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/x5BM7/btsPQuW49TH/GoO6o0z48LDG0ko3G1yoe0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/x5BM7/btsPQuW49TH/GoO6o0z48LDG0ko3G1yoe0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/x5BM7/btsPQuW49TH/GoO6o0z48LDG0ko3G1yoe0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fx5BM7%2FbtsPQuW49TH%2FGoO6o0z48LDG0ko3G1yoe0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;400&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;감성 점수(예시)&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;간단히 샘플 점수를 부여해서 흐름을 보는 용도이다.&lt;br /&gt;실제론 KoNLPy/transformers 등으로 감성 모델 적용 가능하다.&lt;br /&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import numpy as np

np.random.seed(42)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 재현성
df['sentiment'] = np.random.rand(len(df))&amp;nbsp;&amp;nbsp;# 0~1 사이 임의 점수
df[['clean_text','sentiment']]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ck2M3R/btsPQwm45ar/lSVdSY9jbqiQohJd23hCJk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ck2M3R/btsPQwm45ar/lSVdSY9jbqiQohJd23hCJk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ck2M3R/btsPQwm45ar/lSVdSY9jbqiQohJd23hCJk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fck2M3R%2FbtsPQwm45ar%2FlSVdSY9jbqiQohJd23hCJk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;400&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;400&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;언어별 방문/재방문 의도&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어떤 언어권(=국가/시장)에서 재방문 의도가 높은지 한눈에 확인 &amp;rarr; 충성 고객 관리/CRM 타깃 실마리가 된다.&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import matplotlib.pyplot as plt

intent_counts = df.groupby('lang')[['visit_intent', 'revisit_intent']].sum()

plt.figure(figsize=(6,4))
intent_counts.plot(kind='bar', figsize=(6,4))
plt.title('Visit vs Revisit Intent by Language')
plt.ylabel('Count'); plt.xticks(rotation=0)
plt.tight_layout()
plt.show()&amp;nbsp;&amp;nbsp;# 저장하려면 plt.savefig('intent_by_language.png')&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;800&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZyhSk/btsPOHi0QE6/E27YC0HCnaMAR8kTyQKCl1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZyhSk/btsPOHi0QE6/E27YC0HCnaMAR8kTyQKCl1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZyhSk/btsPOHi0QE6/E27YC0HCnaMAR8kTyQKCl1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZyhSk%2FbtsPOHi0QE6%2FE27YC0HCnaMAR8kTyQKCl1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;800&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;800&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;날짜별 평균 감성 점수&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기간별 분위기를 파악 &amp;rarr; 캠페인 전/후 비교, 이슈 시점 탐지에 유용하다.&lt;/p&gt;
&lt;pre class=&quot;python&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;sentiment_trend = df.groupby('created_at')['sentiment'].mean()

plt.figure(figsize=(6,4))
sentiment_trend.plot(marker='o')
plt.title('Average Sentiment Score Over Time')
plt.ylabel('Sentiment Score'); plt.ylim(0,1); plt.grid(True)
plt.tight_layout()
plt.show()&amp;nbsp;&amp;nbsp;# 저장하려면 plt.savefig('sentiment_trend.png')&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;800&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b7youB/btsPNGSqVBo/fC7weO4s5wAQWPiOeBWPGK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b7youB/btsPNGSqVBo/fC7weO4s5wAQWPiOeBWPGK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b7youB/btsPNGSqVBo/fC7weO4s5wAQWPiOeBWPGK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb7youB%2FbtsPNGSqVBo%2FfC7weO4s5wAQWPiOeBWPGK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;800&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;800&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br /&gt;&amp;nbsp;&lt;br /&gt;오늘 프로젝트 분석 초반 결과를 보니까,&lt;br /&gt;언어별로 여행에 대한 생각이 정말 달랐다는 것을 예측 가능했다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;방문 의도가 높은 언어권은&lt;br /&gt;아직 안 와봤지만 관심이 한껏 있는 사람들이라,&lt;br /&gt;이쪽은 아예 타겟을 딱 정해서&lt;br /&gt;인지도 올리는 캠페인을 하면 좋겠다는 생각이 들었다.&lt;br /&gt;&lt;br /&gt;예를 들어, 현지 인플루언서랑 협업해서&lt;br /&gt;콘텐츠를 만든다든지,&lt;br /&gt;그 나라 말로 된 랜딩 페이지를&lt;br /&gt;깔끔하게 만들어서 보여주면 효과가 있을 것 같다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;반대로 재방문 의도가 높은 언어권은&lt;br /&gt;이미 한 번 와봤는데 또 오고 싶어 하는 경우다.&lt;br /&gt;&lt;br /&gt;여기는 그냥 찐팬들이라고 보면 되겠다.&lt;br /&gt;&lt;br /&gt;이 사람들한테는&lt;br /&gt;재방문 고객 전용 혜택을 주거나,&lt;br /&gt;&lt;br /&gt;왜 다시 오고 싶은지 그 이유를 스토리로&lt;br /&gt;만들어서 보여주는 게 좋겠다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;감성 점수도 꽤 흥미로웠다.&lt;br /&gt;&lt;br /&gt;특정 시기에 점수가 뚝 떨어진다면,&lt;br /&gt;그 시기에 서비스나 경험에 문제가&lt;br /&gt;있었을 가능성이 높다.&lt;br /&gt;&lt;br /&gt;그때 무슨 일이 있었는지 파악하고,&lt;br /&gt;전략을 조금만 손보면 상황이 훨씬 좋아질 거다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;그리고 자주 쓰이는 긍정적인 표현이나 후기 문구를&lt;br /&gt;광고 문구에 그대로 녹여서 현지 언어로 쓰면,&lt;br /&gt;메시지가 훨씬 더 와닿고 반응률도 올라갈 거라고 생각한다.&lt;br /&gt;&amp;nbsp;&lt;br /&gt;프로젝트를 완성하기 위해서 아직도 해야 할 게 많지만&lt;br /&gt;우선 차근 차근 계획부터 세워보기로...&lt;br /&gt;&amp;nbsp;&lt;br /&gt;SQLD 자격증 시험도 며칠 안 남았으니까&lt;br /&gt;아 시험부터 우선 해야겠다...ㅜㅜ&lt;br /&gt;&amp;nbsp;&lt;br /&gt;지난주에 ADsP 시험 봤었는데&lt;br /&gt;9월 결과에 SQLD 자격증 둘 다 합격이었으면 좋겠다(제발)&lt;br /&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/마케터의 파이썬 활용법</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/10</guid>
      <comments>https://pyj2qdat.tistory.com/10#entry10comment</comments>
      <pubDate>Tue, 12 Aug 2025 02:16:07 +0900</pubDate>
    </item>
    <item>
      <title>8. 콘텐츠 추천부터 리뷰 분석까지, 마케터의 딥러닝 실습 노트</title>
      <link>https://pyj2qdat.tistory.com/9</link>
      <description>&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cbjR8f/btsPGGd7WJh/KtZkpUkFcRkTqT3OdEA6D1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cbjR8f/btsPGGd7WJh/KtZkpUkFcRkTqT3OdEA6D1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cbjR8f/btsPGGd7WJh/KtZkpUkFcRkTqT3OdEA6D1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcbjR8f%2FbtsPGGd7WJh%2FKtZkpUkFcRkTqT3OdEA6D1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;595&quot; height=&quot;595&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;최근 수업에서 딥러닝 실험을 몇 가지 해봤다.&lt;br&gt;&amp;nbsp;&lt;br&gt;마케터 입장에서는 아직 좀 생소하고 어려운 기술일 수 있지만,&lt;br&gt;나중엔 분명 실무에 도움 될 것 같아서 흐름을 정리해 두기로 했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;넷플릭스 추천 시스템을 구현해보고,&lt;br&gt;네이버 영화 리뷰 데이터를 가지고 &lt;br&gt;감성 분석 실험도 진행했다.&lt;br&gt;&lt;br&gt;그리고 이걸 활용하면 마케터가 할 수 있는 일들이 &lt;br&gt;생각보다 꽤 많다는 걸 알게 됐다.&lt;/p&gt;&lt;blockquote data-ke-style=&quot;style3&quot;&gt;고객 리뷰 자동 분석 → 쇼핑몰 리뷰, 앱 리뷰, 별점 코멘트 등 긍/부정 실시간 분류 &lt;br&gt;SNS 여론 모니터링 → 트위터/인스타에서 브랜드 키워드 감성 추이 분석 &lt;br&gt;콘텐츠 평가 자동화 → 유튜브 댓글, 영화 리뷰 등 정성 피드백을 수치화 &lt;br&gt;추천 알고리즘 → 브랜드의 블로그, 제품, 뉴스레터에도 적용&amp;nbsp;&lt;/blockquote&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;예를 들어,&lt;br&gt;고객이 남긴 리뷰나 별점 코멘트를 자동으로&lt;br&gt;긍정/부정으로 분류할 수 있다면,&lt;br&gt;리뷰가 많은 상품도 일일이 수동으로 확인하지 않고&lt;br&gt;품질 이슈를 자동으로 파악할 수 있다.&lt;br&gt;&amp;nbsp;&lt;br&gt;또한 브랜드 키워드가 포함된 게시글을 크롤링하거나,&lt;br&gt;캠페인 론칭 전후의 감정 변화를&lt;br&gt;비교·분석하는 것도 가능하다.&lt;br&gt;&amp;nbsp;&lt;br&gt;[브랜드 캠페인 전후 감정 비율 변화 분석 예시]&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1600&quot; data-origin-height=&quot;1000&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cufw4X/btsPF6xmO1Z/pdisqnFiOAzkG34mNQqorK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cufw4X/btsPF6xmO1Z/pdisqnFiOAzkG34mNQqorK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cufw4X/btsPF6xmO1Z/pdisqnFiOAzkG34mNQqorK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcufw4X%2FbtsPF6xmO1Z%2FpdisqnFiOAzkG34mNQqorK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1600&quot; height=&quot;1000&quot; data-origin-width=&quot;1600&quot; data-origin-height=&quot;1000&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;유사 콘텐츠를 자동으로 추천해&lt;br&gt;“이 글을 본 사용자가 좋아한 콘텐츠” 형태로 제공하거나,&lt;br&gt;&lt;br&gt;장르 및 키워드 기반의 콘텐츠 큐레이션,&lt;br&gt;뉴스레터 구독자에게&lt;br&gt;맞춤형 콘텐츠를 전달할 수도 있다.&lt;br&gt;&amp;nbsp;&lt;br&gt;물론 마케터가 모델을 직접 설계하거나&lt;br&gt;파이썬 코드를 전부 짤 필요는 없다.&lt;br&gt;&amp;nbsp;&lt;br&gt;하지만 중요한 건,&lt;br&gt;어떤 데이터를 어떻게 수집하고,&lt;br&gt;고객 여정의 어느 지점에 감정 분석을 적용하며,&lt;br&gt;추천 시스템을 어떤 채널에 녹여낼지를&lt;br&gt;설계하는 일이라고 생각한다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그래서 이번에 배운 정보들은&lt;br&gt;마케팅 기획에 도움이 되는&amp;nbsp;&lt;br&gt;도구가 될 거라고 생각한다.&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;01. 콘텐츠 추천 시스템&lt;/h2&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;넷플릭스를 보다 보면&lt;br&gt;비슷한 분위기의 콘텐츠를 계속해서 추천받곤 한다.&lt;br&gt;&amp;nbsp;&lt;br&gt;한편으로는 늘 궁금했다.&lt;br&gt;&lt;br&gt;어떻게 이렇게 관련 있는&lt;br&gt;콘텐츠를 잘 골라내는 걸까?&lt;br&gt;그 기준은 뭘까?&lt;br&gt;&amp;nbsp;&lt;br&gt;그래서 수업에서&lt;br&gt;넷플릭스 콘텐츠 데이터를 활용해&lt;br&gt;사용자가 본 콘텐츠와 유사한 작품을&lt;br&gt;자동으로 추천해 주는 시스템을 구현해보고 알게 되었다.&lt;br&gt;&amp;nbsp;&lt;br&gt;데이터에는 콘텐츠의 제목, 설명, 장르 정보가 담겨 있었고,&lt;br&gt;이걸 바탕으로 유사도를 계산해서&lt;br&gt;‘비슷한 콘텐츠’를 추천해주는 구조다.&lt;br&gt;&amp;nbsp;&lt;br&gt;예를 들어,&lt;br&gt;“Black Panther”를 기준으로&lt;br&gt;추천된 작품은 다음과 같다:&lt;/p&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;Men in Black&lt;/li&gt;&lt;li&gt;Black Lightning&lt;/li&gt;&lt;li&gt;Illang: The Wolf Brigade&lt;/li&gt;&lt;/ul&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;설명에 자주 등장하는 단어,&lt;br&gt;장르의 유사성, 제목 간의 의미 등&lt;br&gt;여러 기준을 종합해 유사도를 측정했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;마케터 입장에서 보면,&lt;br&gt;이런 추천 시스템은&lt;br&gt;단지 영상 플랫폼에만 필요한 게 아니다.&lt;/p&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;쇼핑몰에서 비슷한 상품을 추천하거나&lt;/li&gt;&lt;li&gt;뉴스레터에서 관심 있을 만한 콘텐츠를 보여주거나&lt;/li&gt;&lt;li&gt;브랜드 블로그 글을 자동 큐레이션하는 등&lt;/li&gt;&lt;/ul&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;여러 채널에서 &lt;br&gt;사용자 맞춤 콘텐츠 경험을 설계할 수 있다.&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 제목, 설명, 장르를 합쳐서 콘텐츠 요약 만들기
df['combined'] = df['title'] + &quot; &quot; + df['description'] + &quot; &quot; + df['listed_in']

# 벡터화: TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined'])

# 코사인 유사도 계산
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;이렇게 유사도를 계산해 두면,&lt;br&gt;어떤 콘텐츠를 기준으로 &lt;br&gt;비슷한 콘텐츠를 찾아주는 것도 가능하다.&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 콘텐츠 추천 함수
def get_recommendations(title, cosine_sim=cosine_sim):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;idx = df[df['title'] == title].index[0]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sim_scores = list(enumerate(cosine_sim[idx]))
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sim_scores = sim_scores[1:6]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;content_indices = [i[0] for i in sim_scores]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return df['title'].iloc[content_indices]&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;“Black Panther”를 기준으로 실행해 보면&lt;br&gt;비슷한 분위기의 콘텐츠들이 자동으로 추천된다.&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;get_recommendations(&quot;Black Panther&quot;)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;1000&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dRziPE/btsPJTJjpKC/58e6xnyc0aSnWgxFuwNd11/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dRziPE/btsPJTJjpKC/58e6xnyc0aSnWgxFuwNd11/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dRziPE/btsPJTJjpKC/58e6xnyc0aSnWgxFuwNd11/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdRziPE%2FbtsPJTJjpKC%2F58e6xnyc0aSnWgxFuwNd11%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2000&quot; height=&quot;1000&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;1000&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;넷플릭스 콘텐츠는&lt;br&gt;어떤 장르가 많은지도 확인해 봤다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1800&quot; data-origin-height=&quot;1000&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dqRCP8/btsPF8IEaXA/zv2Z2KxWYCUPxsx1JFyJW0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dqRCP8/btsPF8IEaXA/zv2Z2KxWYCUPxsx1JFyJW0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dqRCP8/btsPF8IEaXA/zv2Z2KxWYCUPxsx1JFyJW0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdqRCP8%2FbtsPF8IEaXA%2Fzv2Z2KxWYCUPxsx1JFyJW0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1800&quot; height=&quot;1000&quot; data-origin-width=&quot;1800&quot; data-origin-height=&quot;1000&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;전체&amp;nbsp;콘텐츠&amp;nbsp;분포를&amp;nbsp;보면&amp;nbsp;&lt;br&gt;Action,&amp;nbsp;Drama,&amp;nbsp;Comedy&amp;nbsp;순으로&amp;nbsp;많았다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;02. 감정분석 실험&lt;/h2&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;영화 보고 나서 리뷰를 남길 때가 있다.&lt;br&gt;&lt;br&gt;어떤 영화는 재미있었고,&lt;br&gt;어떤 영화는 별로였고.&lt;br&gt;&lt;br&gt;그걸 말로 쓰다 보면,&lt;br&gt;감정이 묻어나게 된다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그런데 문장을 보기만 해도&lt;br&gt;이게 긍정인지,&lt;br&gt;부정인지 예측할 수 있을까?&lt;br&gt;&amp;nbsp;&lt;br&gt;평소에 궁금했지만&lt;br&gt;수업에서 네이버&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;영화&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;리뷰&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;데이터를&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;기반으로&lt;span style=&quot;color: #333333;&quot;&gt;&lt;br&gt;&lt;/span&gt;감성&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;분석&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;모델을&lt;span style=&quot;color: #333333;&quot;&gt;&amp;nbsp;&lt;/span&gt;만들&amp;nbsp;기회가 생겨서 실험해 봤다.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;데이터 &amp;amp; 전처리&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;네이버 영화 리뷰 데이터는&lt;br&gt;긍정/부정 라벨이 붙은 약 15만 개의 텍스트로 구성돼 있다.&lt;br&gt;&amp;nbsp;&lt;br&gt;먼저, 텍스트를 정제하고&lt;br&gt;형태소 단위로 쪼갠 뒤&lt;br&gt;불필요한 단어들을 제거했다.&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 특수문자 제거
train['document'] = train['document'].str.replace(&quot;[^ㄱ-ㅎㅏ-ㅣ가-힣 ]&quot;, &quot;&quot;)
train.dropna(inplace=True)

# 형태소 분석
from konlpy.tag import Okt
okt = Okt()
def tokenize(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return [w for w in okt.morphs(text) if w not in stop_words]&lt;/code&gt;&lt;/pre&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;모델구조&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;텍스트 데이터를 숫자 형태로 바꾸고,&lt;br&gt;RNN 구조의 딥러닝 모델에 입력했다.&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# PyTorch 기반 RNN 감성 분석 모델
class SentimentLSTM(nn.Module):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;super().__init__()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;self.embedding = nn.Embedding(vocab_size, embedding_dim)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;self.fc = nn.Linear(hidden_dim, output_dim)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;def forward(self, x):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;embedded = self.embedding(x)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;_, (hidden, _) = self.lstm(embedded)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;output = self.fc(hidden[-1])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return output&lt;/code&gt;&lt;/pre&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;학습 결과&lt;/h3&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;총 5번의 에폭(epoch)을 돌렸고,&lt;br&gt;학습 정확도와 검증 정확도를 확인했다.&lt;/p&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;학습 정확도는 78.7% → 92.6%&lt;/li&gt;&lt;li&gt;검증 정확도는 84.0% → 85.0%&lt;/li&gt;&lt;/ul&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;즉, 훈련은 잘 되었고&lt;br&gt;과적합 없이 안정적인 모델이 나왔다.&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1800&quot; data-origin-height=&quot;1000&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/PtuZr/btsPF9AQSFH/4UOjhqeiomLKa5j4kQA3k1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/PtuZr/btsPF9AQSFH/4UOjhqeiomLKa5j4kQA3k1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/PtuZr/btsPF9AQSFH/4UOjhqeiomLKa5j4kQA3k1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FPtuZr%2FbtsPF9AQSFH%2F4UOjhqeiomLKa5j4kQA3k1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1800&quot; height=&quot;1000&quot; data-origin-width=&quot;1800&quot; data-origin-height=&quot;1000&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;자주 등장한 감성 키워드&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;긍정/부정 리뷰에서&lt;br&gt;자주 등장한 단어들을 시각화해 봤다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;1000&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bB7mzt/btsPIupSYrl/VoIMqejt304Otp33Hchc01/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bB7mzt/btsPIupSYrl/VoIMqejt304Otp33Hchc01/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bB7mzt/btsPIupSYrl/VoIMqejt304Otp33Hchc01/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbB7mzt%2FbtsPIupSYrl%2FVoIMqejt304Otp33Hchc01%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2000&quot; height=&quot;1000&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;1000&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;자주&amp;nbsp;등장하는&amp;nbsp;긍정/부정&amp;nbsp;키워드를&amp;nbsp;시각화해 보면&amp;nbsp;&amp;nbsp;&lt;br&gt;어떤&amp;nbsp;감정&amp;nbsp;단어들이&amp;nbsp;리뷰에&amp;nbsp;자주&amp;nbsp;쓰이는지&amp;nbsp;보인다.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;긍정/부정 비율 분포&lt;/h4&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;전체 리뷰 중 어떤 감정이 더 많았는지도 분석했다.&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;1200&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/o6KUq/btsPGuj9LZA/etpO0KV3BRCsl0g3Hzq3nk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/o6KUq/btsPGuj9LZA/etpO0KV3BRCsl0g3Hzq3nk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/o6KUq/btsPGuj9LZA/etpO0KV3BRCsl0g3Hzq3nk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fo6KUq%2FbtsPGuj9LZA%2FetpO0KV3BRCsl0g3Hzq3nk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1200&quot; height=&quot;1200&quot; data-origin-width=&quot;1200&quot; data-origin-height=&quot;1200&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;쇼핑몰,&amp;nbsp;앱&amp;nbsp;리뷰,&amp;nbsp;별점&amp;nbsp;코멘트를&amp;nbsp;긍/부정으로&amp;nbsp;분류해서&amp;nbsp;&amp;nbsp;&lt;br&gt;이슈&amp;nbsp;제품을&amp;nbsp;자동으로&amp;nbsp;감지할&amp;nbsp;수&amp;nbsp;있다.&lt;br&gt;&amp;nbsp;&lt;br&gt;이렇게&amp;nbsp;전체&amp;nbsp;리뷰&amp;nbsp;중&amp;nbsp;감정의&amp;nbsp;비중을&amp;nbsp;분석해&amp;nbsp;&amp;nbsp;&lt;br&gt;이슈&amp;nbsp;대응에&amp;nbsp;활용할&amp;nbsp;수&amp;nbsp;있다.&lt;br&gt;&amp;nbsp;&lt;br&gt;마케터 입장에서는&lt;br&gt;고객 감정을 빠르게 감지하고,&lt;br&gt;그에 맞는 대응 전략을 세울 수 있는 도구로 &lt;br&gt;쓰일 수 있다고 느꼈다.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot;&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;이번 실험을 하면서 계속 생각이 들었다.&lt;br&gt;이걸 마케터가 다 알아야 하나?&lt;br&gt;&amp;nbsp;&lt;br&gt;모델 구조를 설계하고,&lt;br&gt;파이썬 코드로 데이터 전처리하고,&lt;br&gt;&lt;br&gt;신경망을 학습시키는 일은&lt;br&gt;분명 개발자나 데이터 분석가의 영역에 가깝다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그렇다고 마케터는&lt;br&gt;이걸 전혀 몰라도 되는 걸까?&lt;br&gt;&lt;br&gt;그건 또 아니다.&lt;br&gt;&amp;nbsp;&lt;br&gt;중요한 건&lt;br&gt;어디에, 왜, 어떻게 쓸 것인가를&lt;br&gt;판단할 수 있어야 한다.&lt;br&gt;&amp;nbsp;&lt;br&gt;감정 분석을 고객 여정의 어느 시점에 넣을지,&lt;br&gt;리뷰를 자동 분석해서 어떤 제품의 품질 문제를 먼저 감지할지,&lt;br&gt;콘텐츠 추천 시스템을 어떤 채널에 적용해 볼지.&lt;br&gt;&amp;nbsp;&lt;br&gt;이런 걸 기획하고 제안할 수 있는 사람이&lt;br&gt;바로 마케터다.&lt;/p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/egPFXE/btsPHKmy6bO/dZhCBJFL87O4KJKK1xiRj0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/egPFXE/btsPHKmy6bO/dZhCBJFL87O4KJKK1xiRj0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/egPFXE/btsPHKmy6bO/dZhCBJFL87O4KJKK1xiRj0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FegPFXE%2FbtsPHKmy6bO%2FdZhCBJFL87O4KJKK1xiRj0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;644&quot; height=&quot;644&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;기술을 직접 만들 필요는 없지만,&lt;br&gt;기술이 할 수 있는 일을 이해하고,&lt;br&gt;그걸 실무에 녹여내는 시나리오를 그릴 수 있어야 한다.&lt;br&gt;&amp;nbsp;&lt;br&gt;&amp;nbsp;&lt;br&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/마케터의 파이썬 활용법</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/9</guid>
      <comments>https://pyj2qdat.tistory.com/9#entry9comment</comments>
      <pubDate>Tue, 5 Aug 2025 22:26:44 +0900</pubDate>
    </item>
    <item>
      <title>7. SEO 콘텐츠 제작, 뉴스 기사로 키워드 뽑기</title>
      <link>https://pyj2qdat.tistory.com/8</link>
      <description>&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/UP2Ow/btsPAf1dDJ4/jGOlu73nATaIGIllc248r1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/UP2Ow/btsPAf1dDJ4/jGOlu73nATaIGIllc248r1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/UP2Ow/btsPAf1dDJ4/jGOlu73nATaIGIllc248r1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FUP2Ow%2FbtsPAf1dDJ4%2FjGOlu73nATaIGIllc248r1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;631&quot; height=&quot;631&quot; data-origin-width=&quot;1080&quot; data-origin-height=&quot;1080&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;&amp;nbsp;&lt;br&gt;그동안 콘텐츠 마케팅 업무를 하며&lt;br&gt;고객사들의 수많은 콘텐츠를 기획하고 실행해 왔다.&lt;br&gt;&amp;nbsp;&lt;br&gt;특히 SEO 기반 콘텐츠는 절대 빠질 수가 없었다.&lt;br&gt;&amp;nbsp;&lt;br&gt;하지만 콘텐츠를 작성할 때면 늘 같은 패턴이었다.&lt;br&gt;키워드 리서치하고, 연관 검색어를 정리한 뒤,&lt;br&gt;나의 감대로 콘텐츠 구조와 플랜을 짠다.&lt;br&gt;&amp;nbsp;&lt;br&gt;물론 실무 경험과 트렌드 감각은 날마다 쌓이면서 손은 점점 빨라졌지만&lt;br&gt;언제부턴가 나는 &quot;이 키워드가 정말 타깃이 원하고 효과가 있을까? 근거가 뭘까?&lt;br&gt;콘텐츠가 검색엔진에 읽히는 게 증명이 어떻게 되지? 단지 조회수로?&quot;라는 질문들이 남았다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그러다 이번 데이터 분석 과정을 통해&lt;br&gt;실제 뉴스 데이터를 기반으로&amp;nbsp;&lt;br&gt;TF-IDF, Word2Vec, LDA 기법들을 활용해서&amp;nbsp;&lt;br&gt;콘텐츠 기획에 바로 적용 가능한 SEO 키워드를 발굴하는 과정을 경험했다.&lt;br&gt;&amp;nbsp;&lt;br&gt;그동안 막연한 의문들이 구체적인 방식으로 해결되는 과정이었고&lt;br&gt;향후 실무에 적용하기 위해서 과정을 정리해 두었다.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot;&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;데이터 : 2025.04~2025.07 약 3개월간 뉴스 기사 &lt;a href=&quot;https://www.bigkinds.or.kr/&quot; target=&quot;_blank&quot;&gt;&lt;span&gt;https://www.bigkinds.or.kr/&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;figure class=&quot;fileblock&quot; data-ke-align=&quot;alignCenter&quot;&gt;&lt;a href=&quot;https://blog.kakaocdn.net/dn/blcrVo/dJMb9PTIOcy/Bqfpy8pOfPCKoJRt1QokKK/NewsResult_20250424-20250724.csv?attach=1&amp;amp;knm=tfile.csv&quot; class=&quot;&quot;&gt;
    &lt;div class=&quot;image&quot;&gt;&lt;/div&gt;
    &lt;div class=&quot;desc&quot;&gt;&lt;div class=&quot;filename&quot;&gt;&lt;span class=&quot;name&quot;&gt;NewsResult_20250424-20250724.csv&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;size&quot;&gt;11.66MB&lt;/div&gt;
&lt;/div&gt;
  &lt;/a&gt;&lt;/figure&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;1단계 : 데이터 정제&amp;nbsp;&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;제목 + 본문 결합 → 텍스트 컬럼 생성&lt;/li&gt;&lt;li&gt;Okt로 명사 추출 → 불용어 제거&lt;/li&gt;&lt;li&gt;토큰화 완료 후 벡터화 기반 준비&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import pandas as pd

try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv('NewsResult_20250424-20250724.csv')
except FileNotFoundError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Error: 'your_data.csv' not found. Please replace with the correct file path.&quot;)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data = {'제목': ['Example Title 1', 'Example Title 2'],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'본문': ['Example Body 1', 'Example Body 2']}
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.DataFrame(data)

display(df.head())&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 분석 대상 텍스트 필드: 제목 + 본문 결합
df['텍스트'] = df['제목'].fillna('') + ' ' + df['본문'].fillna('')&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1038&quot; data-origin-height=&quot;628&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bE610g/btsPAvIVLLu/ZbcPIUt3wPvFxtIK28It5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bE610g/btsPAvIVLLu/ZbcPIUt3wPvFxtIK28It5k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bE610g/btsPAvIVLLu/ZbcPIUt3wPvFxtIK28It5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbE610g%2FbtsPAvIVLLu%2FZbcPIUt3wPvFxtIK28It5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1038&quot; height=&quot;628&quot; data-origin-width=&quot;1038&quot; data-origin-height=&quot;628&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h4 data-ke-size=&quot;size20&quot;&gt;텍스트 전처리&lt;/h4&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;!pip install konlpy

from konlpy.tag import Okt

okt = Okt()

stopwords = ['있다', '하다', '되다', '으로', '에서', '이다', '를', '에', '및', '로']

def preprocess_text(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if pd.isna(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return []
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = okt.nouns(text)&amp;nbsp;&amp;nbsp;# 명사만 추출
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = [token for token in tokens if token not in stopwords and len(token) &amp;gt; 1]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return tokens

df['텍스트'] = df['제목'].fillna('') + ' ' + df['본문'].fillna('')

# 전처리 적용
df['토큰'] = df['텍스트'].apply(preprocess_text)

display(df.head())&lt;/code&gt;&lt;/pre&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# NaN 제거
print(&quot;Number of rows with NaN in '텍스트' column before dropping:&quot;, df['텍스트'].isna().sum())

df.dropna(subset=['텍스트'], inplace=True)

print(&quot;Number of rows with NaN in '텍스트' column after dropping:&quot;, df['텍스트'].isna().sum())

display(df.head())&lt;/code&gt;&lt;/pre&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;2단계 : 문서를 수치화&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;BoW, TF-IDF 방식으로 문서 벡터화&lt;/li&gt;&lt;li&gt;TF-IDF 상위 단어 분석 결과&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# 전처리된 토큰을 문자열로 변환
df['토큰_문자열'] = df['토큰'].apply(lambda x: ' '.join(x))

# BoW 벡터화
bow_vectorizer = CountVectorizer()
X_bow = bow_vectorizer.fit_transform(df['토큰_문자열'])

# TF-IDF 벡터화
tfidf_vectorizer = TfidfVectorizer()
X_tfidf = tfidf_vectorizer.fit_transform(df['토큰_문자열'])

print(&quot;BoW 벡터 shape:&quot;, X_bow.shape)
print(&quot;TF-IDF 벡터 shape:&quot;, X_tfidf.shape)&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; BoW 벡터 shape: (3744, 7200)&lt;br&gt;-&amp;gt; TF-IDF 벡터 shape: (3744, 7200)&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd

# TF-IDF 벡터에서 상위 단어 추출 (첫 번째 문서 기준)
tfidf_feature_names = tfidf_vectorizer.get_feature_names_out()
first_doc_vector = X_tfidf[0].toarray().flatten()

# 상위 TF-IDF 단어 20개 추출
top_indices = first_doc_vector.argsort()[::-1][:20]
top_tfidf_words = [(tfidf_feature_names[i], first_doc_vector[i]) for i in top_indices]

# 결과 DataFrame으로 정리
top_words_df = pd.DataFrame(top_tfidf_words, columns=[&quot;단어&quot;, &quot;TF-IDF 점수&quot;])

display(top_words_df)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;209&quot; data-origin-height=&quot;441&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bicAme/btsPAMcyTQC/2ugH8qFQCKOrRVMRP7Ngdk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bicAme/btsPAMcyTQC/2ugH8qFQCKOrRVMRP7Ngdk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bicAme/btsPAMcyTQC/2ugH8qFQCKOrRVMRP7Ngdk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbicAme%2FbtsPAMcyTQC%2F2ugH8qFQCKOrRVMRP7Ngdk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;209&quot; height=&quot;441&quot; data-origin-width=&quot;209&quot; data-origin-height=&quot;441&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;→&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt; &lt;/span&gt;이로써 뉴스에서 반복적으로 등장하며&lt;br&gt;SEO 키워드 후보가 될 수 있는 상위 키워드들이 뽑힘.&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;3단계 : 문서 유사도 분석 &amp;amp; 추천 시스템&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;TF-IDF + 코사인 유사도 사용&lt;/li&gt;&lt;li&gt;뉴스 문서 간의 의미 유사도를 수치로 정량화&lt;/li&gt;&lt;li&gt;특정 문서 기준 가장 유사한 뉴스 5건 추천 기능&lt;/li&gt;&lt;/ul&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;→ 향후 비슷한 콘텐츠 추천 알고리즘에도 응용가능&lt;/p&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.metrics.pairwise import cosine_similarity

# 문서 0번과 전체 문서 간 코사인 유사도 계산
similarities = cosine_similarity(X_tfidf[0], X_tfidf).flatten()

# 자기 자신 제외한 상위 5개 유사 문서 인덱스 추출
similar_docs_idx = similarities.argsort()[::-1][1:6]

# 유사도 점수 추출
similar_docs_score = similarities[similar_docs_idx]

# 유사 문서 정보 정리
similar_docs = df.loc[similar_docs_idx, ['제목', '언론사', '일자']]
similar_docs['유사도 점수'] = similar_docs_score&lt;/code&gt;&lt;/pre&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;4단계 : 클러스터링(KMeans)으로 문서 분류&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;뉴스 문서들을 5개 군집으로 자동 분류&lt;/li&gt;&lt;li&gt;각 클러스터별 대표 뉴스 기사 확인&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import euclidean_distances

# 클러스터 수 설정 (임의로 5개)
k = 5
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

# TF-IDF 벡터로 군집화 수행
df['cluster'] = kmeans.fit_predict(X_tfidf)

# 각 클러스터 중심과 가장 가까운 문서 찾기 (대표 뉴스)
centers = kmeans.cluster_centers_
closest_docs = []

for i in range(k):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# i번 클러스터에 속한 문서 인덱스
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_indices = df[df['cluster'] == i].index
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_vectors = X_tfidf[cluster_indices]

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 중심 벡터와의 거리 계산
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;center_vec = centers[i].reshape(1, -1)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;distances = euclidean_distances(cluster_vectors, center_vec).flatten()

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# 중심에 가장 가까운 문서 인덱스 추출
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;closest_idx = cluster_indices[distances.argmin()]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;closest_docs.append(closest_idx)

# 대표 뉴스 출력
representative_df = df.loc[closest_docs, ['cluster', '제목', '언론사', '일자']]
print(representative_df)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;963&quot; data-origin-height=&quot;167&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/utNpc/btsPAxUgOlr/vWKEs12tdG8twezueR9xVK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/utNpc/btsPAxUgOlr/vWKEs12tdG8twezueR9xVK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/utNpc/btsPAxUgOlr/vWKEs12tdG8twezueR9xVK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FutNpc%2FbtsPAxUgOlr%2FvWKEs12tdG8twezueR9xVK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;963&quot; height=&quot;167&quot; data-origin-width=&quot;963&quot; data-origin-height=&quot;167&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;→&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt; 콘텐츠 주제를 자동으로 클러스터링 하면&lt;/span&gt;&lt;br&gt;&lt;span style=&quot;color: #333333;&quot;&gt;시리즈 기획 구조를 설계하기 유용함.&lt;/span&gt;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;5단계 : LDA 토픽 모델링으로 숨어 있는 주제 발견&lt;/span&gt;&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;뉴스 데이터를 5개의 주제로 분해&lt;/li&gt;&lt;li&gt;토픽별 주요 단어 결과 도출&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import pandas as pd

# CountVectorizer로 단어 빈도 행렬 생성
count_vectorizer = CountVectorizer(max_df=0.9, min_df=5, stop_words='english')
X_bow_for_lda = count_vectorizer.fit_transform(df['토큰_문자열'])

# LDA 토픽 모델 학습 (k=5개 주제 가정)
lda_model = LatentDirichletAllocation(n_components=5, random_state=42)
lda_model.fit(X_bow_for_lda)

# 각 토픽별 대표 단어 추출
n_top_words = 10
feature_names = count_vectorizer.get_feature_names_out()

topic_keywords = []
for topic_idx, topic in enumerate(lda_model.components_):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top_features_ind = topic.argsort()[:-n_top_words - 1:-1]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top_words = [feature_names[i] for i in top_features_ind]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;topic_keywords.append((f&quot;Topic {topic_idx}&quot;, top_words))

topic_df = pd.DataFrame(topic_keywords, columns=[&quot;토픽 번호&quot;, &quot;대표 키워드&quot;])
print(topic_df)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;963&quot; data-origin-height=&quot;167&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bkYLv8/btsPAMKpdyR/BCc5P0CCKcHA1tKrrIX2Ak/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bkYLv8/btsPAMKpdyR/BCc5P0CCKcHA1tKrrIX2Ak/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bkYLv8/btsPAMKpdyR/BCc5P0CCKcHA1tKrrIX2Ak/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbkYLv8%2FbtsPAMKpdyR%2FBCc5P0CCKcHA1tKrrIX2Ak%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;963&quot; height=&quot;167&quot; data-origin-width=&quot;963&quot; data-origin-height=&quot;167&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;br&gt;→ 콘텐츠 구조를 만드는 틀로 활용 가능&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;6단계 : Word2Vec으로 단어 임베딩&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;단어 벡터 학습 &lt;span style=&quot;color: #333333;&quot;&gt;→&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt; 의미적으로 가까운 단어 도출&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&quot;비트코인&quot;의 유사 단어 Top 10&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;!pip install gensim

from gensim.models import Word2Vec
import pandas as pd
from konlpy.tag import Okt

try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv('NewsResult_20250424-20250724.csv')
except FileNotFoundError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Error: 'NewsResult_20250424-20250724.csv' not found. Please replace with the correct file path.&quot;)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data = {'제목': ['Example Title 1', 'Example Title 2'],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'본문': ['Example Body 1', 'Example Body 2']}
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.DataFrame(data)


df['텍스트'] = df['제목'].fillna('') + ' ' + df['본문'].fillna('')

okt = Okt()

stopwords = ['있다', '하다', '되다', '으로', '에서', '이다', '를', '에', '및', '로']

def preprocess_text(text):

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if pd.isna(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return []
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = okt.nouns(text)&amp;nbsp;&amp;nbsp;# 명사만 추출
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = [token for token in tokens if token not in stopwords and len(token) &amp;gt; 1]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return tokens


df['토큰'] = df['텍스트'].apply(preprocess_text)

# 전처리된 토큰 리스트를 기반으로 Word2Vec 모델 학습
sentences = df['토큰'].tolist()

# Word2Vec 모델 학습 (window: 문맥 크기, vector_size: 임베딩 차원 수)
w2v_model = Word2Vec(sentences=sentences, vector_size=100, window=5, min_count=2, sg=1, workers=4, seed=42)

# 예시 단어로 유사 단어 Top 10 출력
similar_words = w2v_model.wv.most_similar(&quot;비트코인&quot;, topn=10)

for word, score in similar_words:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(f&quot;{word}: {score:.4f}&quot;)&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;209&quot; data-origin-height=&quot;238&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c0CZWW/btsPylVFb4F/ORXMc9aP9Kx0u9rC0eqny1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c0CZWW/btsPylVFb4F/ORXMc9aP9Kx0u9rC0eqny1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c0CZWW/btsPylVFb4F/ORXMc9aP9Kx0u9rC0eqny1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc0CZWW%2FbtsPylVFb4F%2FORXMc9aP9Kx0u9rC0eqny1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;209&quot; height=&quot;238&quot; data-origin-width=&quot;209&quot; data-origin-height=&quot;238&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;→ 의미가 통하는 키워드 확장 가능&lt;br&gt;SEO 키워드 믹스 설계 시 유용&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;hr data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style1&quot;&gt;&lt;h2 data-ke-size=&quot;size26&quot;&gt;7단계 : 시각화하기, 워드클라우드 &amp;amp; 클러스터별 뉴스 분포&lt;/h2&gt;&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;&lt;li&gt;상위 키워드 시각화 → 인사이트 도출하기&lt;/li&gt;&lt;li&gt;클러스터별 기사 수 시각화 → 콘텐츠 비중 판단&lt;/li&gt;&lt;/ul&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 폰트설치
!sudo apt-get update -qq
!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm -rf ~/.cache/matplotlib&lt;/code&gt;&lt;/pre&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# 한글 제목 깨짐 해결하기
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

font_path = '/usr/share/fonts/truetype/nanum/NanumGothic.ttf'
font_name = fm.FontProperties(fname=font_path).get_name()
plt.rc('font', family=font_name)

plt.rcParams['axes.unicode_minus'] = False

print(f&quot;Matplotlib font set to: {plt.rcParams['font.family']}&quot;)&lt;/code&gt;&lt;/pre&gt;&lt;h3 data-ke-size=&quot;size23&quot;&gt;7-1 워드클라우드로 주요 단어 한눈에 보기&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd
from konlpy.tag import Okt
try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv('NewsResult_20250424-20250724.csv')
except FileNotFoundError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Error: 'NewsResult_20250424-20250724.csv' not found. Please replace with the correct file path.&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data = {'제목': ['Example Title 1', 'Example Title 2'],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'본문': ['Example Body 1', 'Example Body 2']}
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.DataFrame(data)

df['텍스트'] = df['제목'].fillna('') + ' ' + df['본문'].fillna('')

okt = Okt()

stopwords = ['있다', '하다', '되다', '으로', '에서', '이다', '를', '에', '및', '로']

def preprocess_text(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if pd.isna(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return []
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = okt.nouns(text)&amp;nbsp;&amp;nbsp;# 명사만 추출
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = [token for token in tokens if token not in stopwords and len(token) &amp;gt; 1]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return tokens

df['토큰'] = df['텍스트'].apply(preprocess_text)

# 전체 뉴스 토큰 리스트 평탄화
all_tokens = [token for tokens in df['토큰'] for token in tokens]
word_freq = Counter(all_tokens)

# 워드클라우드 생성
font_path = '/usr/share/fonts/truetype/nanum/NanumGothic.ttf'

wordcloud = WordCloud(font_path=font_path,
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;background_color='white', width=800, height=400).generate_from_frequencies(word_freq)

# 시각화
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title(&quot;뉴스 전체 주요 단어 워드클라우드&quot;, fontsize=16)
plt.show()&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;944&quot; data-origin-height=&quot;506&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b9hVJc/btsPyF7Kvbp/PW8SujQPsDX2qZ4H7GJhCK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b9hVJc/btsPyF7Kvbp/PW8SujQPsDX2qZ4H7GJhCK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b9hVJc/btsPyF7Kvbp/PW8SujQPsDX2qZ4H7GJhCK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb9hVJc%2FbtsPyF7Kvbp%2FPW8SujQPsDX2qZ4H7GJhCK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;944&quot; height=&quot;506&quot; data-origin-width=&quot;944&quot; data-origin-height=&quot;506&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;7-2 그래프로 분석 결과 표현하기&lt;/h3&gt;&lt;pre data-ke-type=&quot;codeblock&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df
except NameError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;DataFrame 'df' not found. Please run the preceding cells to load and preprocess data.&quot;)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.read_csv('NewsResult_20250424-20250724.csv')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;except FileNotFoundError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Error: 'NewsResult_20250424-20250724.csv' not found. Please replace with the correct file path.&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data = {'제목': ['Example Title 1', 'Example Title 2'],
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'본문': ['Example Body 1', 'Example Body 2']}
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df = pd.DataFrame(data)

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df['텍스트'] = df['제목'].fillna('') + ' ' + df['본문'].fillna('')

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;okt
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;preprocess_text
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;except NameError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;from konlpy.tag import Okt
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;okt = Okt()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stopwords = ['있다', '하다', '되다', '으로', '에서', '이다', '를', '에', '및', '로']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;def preprocess_text(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if pd.isna(text):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return []
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = okt.nouns(text)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tokens = [token for token in tokens if token not in stopwords and len(token) &amp;gt; 1]
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return tokens


&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df['토큰'] = df['텍스트'].apply(preprocess_text)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df['토큰_문자열'] = df['토큰'].apply(lambda x: ' '.join(x))
try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X_tfidf
except NameError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;TF-IDF matrix 'X_tfidf' not found. Generating now.&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tfidf_vectorizer = TfidfVectorizer()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X_tfidf = tfidf_vectorizer.fit_transform(df['토큰_문자열'])

k = 5 # 클러스터 수 설정 (임의로 5개)
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

# TF-IDF 벡터로 군집화 수행
df['cluster'] = kmeans.fit_predict(X_tfidf)

# 클러스터별 뉴스 개수 시각화
try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.rc('font', family=plt.rcParams['font.family'][0])
except (KeyError, IndexError):

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;font_path = '/usr/share/fonts/truetype/nanum/NanumGothic.ttf'
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;font_name = fm.FontProperties(fname=font_path).get_name()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plt.rc('font', family=font_name)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Matplotlib font set to NanumGothic for plot title.&quot;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;except Exception:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print(&quot;Could not set NanumGothic font. Plot title might show broken characters.&quot;)

plt.figure(figsize=(8, 4))
sns.countplot(x='cluster', data=df, palette='Blues')
plt.title(&quot;클러스터별 뉴스 기사 개수&quot;, fontsize=14)
plt.xlabel(&quot;클러스터&quot;)
plt.ylabel(&quot;기사 수&quot;)
plt.show()&lt;/code&gt;&lt;/pre&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;701&quot; data-origin-height=&quot;392&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/r3bBJ/btsPzTw93Xn/kgkktKOKTy9ylo9Z2Q7t4k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/r3bBJ/btsPzTw93Xn/kgkktKOKTy9ylo9Z2Q7t4k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/r3bBJ/btsPzTw93Xn/kgkktKOKTy9ylo9Z2Q7t4k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fr3bBJ%2FbtsPzTw93Xn%2FkgkktKOKTy9ylo9Z2Q7t4k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;701&quot; height=&quot;392&quot; data-origin-width=&quot;701&quot; data-origin-height=&quot;392&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot; style=&quot;text-align: left;&quot;&gt;이번 실험을 통해 느낀건&lt;br&gt;데이터는 나의 콘텐츠 기획 능력을 더욱 신뢰 있게 해준다. &lt;br&gt;&lt;br&gt;이제 더 이상 단순 검색량으로만 키워드를 고르지 않을 것이다.&lt;br&gt;&lt;br&gt;뉴스라는 실시간 텍스트 데이터를 직접 분석해보니, &lt;br&gt;콘텐츠 기획의 방향이 한 층 더 명확해졌다.&lt;br&gt;&lt;br&gt;다음에는 뉴스외에도 블로그나 리뷰 데이터를 통해 검색 의도 흐름을 분석해 봐야겠다. &lt;br&gt;&lt;br&gt;&lt;/p&gt;</description>
      <category>마케터 관점의 데이터분석/마케터의 파이썬 활용법</category>
      <category>그로스마케팅</category>
      <category>데이터분석</category>
      <category>데이터분석가</category>
      <category>데이터분석가부트캠프</category>
      <category>디지털마케팅</category>
      <category>마케팅</category>
      <category>마케팅데이터분석</category>
      <category>멀티캠퍼스it부트캠프</category>
      <category>부트캠프후기</category>
      <category>퍼포먼스마케팅</category>
      <author>뺩빱</author>
      <guid isPermaLink="true">https://pyj2qdat.tistory.com/8</guid>
      <comments>https://pyj2qdat.tistory.com/8#entry8comment</comments>
      <pubDate>Fri, 25 Jul 2025 17:45:45 +0900</pubDate>
    </item>
  </channel>
</rss>